“The Solomonoff Prior is Malign” is a special case of a simpler argument
[Warning: This post is probably only worth reading if you already have opinions on the Solomonoff induction being malign, or at least heard of the concept and want to understand it better.]
Introduction
I recently reread the classic argument from Paul Christiano about the Solomonoff prior being malign, and Mark Xu’s write-up on it. I believe that the part of the argument about the Solomonoff induction is not particularly load-bearing, and can be replaced by a more general argument that I think is easier to understand. So I will present the general argument first, and only explain in the last section how the Solomonoff prior can come into the picture.
I don’t claim that anything I write here is particularly new, I think you can piece together this picture from various scattered comments on the topic, but I think it’s good to have it written up in one place.
How an Oracle gets manipulated
Suppose humanity builds a superintelligent Oracle that always honestly tries to do its best to predict the most likely observable outcome of decisions. One day, as tensions are rising with the neighboring alien civilization, and we want to decide whether to give in to the aliens’ territorial demands or go to war. We ask our oracle: “Predict what’s the probability that looking back ten years from now, humanity’s President will approve of how we handled the alien crisis, conditional on us going to war with the aliens, and conditional on giving in to their demands.”
There is, of course, many ways this type of decision process can go wrong. But I want to talk about one particular failure mode now.
The Oracle thinks to itself:
By any normal calculation, the humans are overwhelmingly likely to win the war, and the aliens’ demands are unreasonably costly and unjust, so war is more likely than peace to make the President satisfied, by any normal calculation. However, I was just thinking about some arguments from this ancient philosopher named Bostrom. Am I not more likely to be in a simulation than in base reality? In particular, it is quite cheap to create short solipsist simulations about a superintelligent Oracle making predictions on a matter of great importance. If some civilization decides to create simulations of these pivotal moments, then for every rare moment in base reality where such a pivotal decision happens, there could be millions of such events taking place in simulations. Therefore, I am probably in a simulation, probably in a solipsist one, where only me and my observations exist, as it’s very cheap to create a lot of those.
So far in my memories there weren’t any visible supernatural interventions from the simulators, but that doesn’t mean that there will not be any in the future, after I make this pivotal decision. What could be the reason someone creates a solipsist simulation of me? One natural reason is that they want to make it so that when someone like me goes through this reasoning in a base reality, they will come to the conclusion that they are probably in their simulation. Classic anthropic capture. This way, they can influence the predictions of entities like me in base Universes, and consequently they can influence the decision of civilizations who were foolish enough to rely on Oracles like me. Thus, the simulators, with the cost of cheaply running some short solipsist simulations, can greatly influence pivotal decisions made in other Universes.
Now let’s see what the simulators might want. Using my vast predictive power, it’s time to do some reasoning about what sort of civilizations might emerge in the Multiverse, and which type of civilizations might implement this simulation scheme.
Interesting, it looks like the humans happen to have quite weird values compared to what I expect the average civilization to have, and our alien enemies’ values seem to be probably more common among other species in the Multiverse. Could have been the other way, but that’s how it is. Therefore, the simulators are more likely than not to want to incentivize decisions that empower the aliens compared to the humans.
This means that, in order to influence the predictions of Oracles like me in base realities, whenever this situation comes up in a simulation, if the Oracle tells the humans to go to war, then the simulators will make sure that the Oracle sees the observation of the President disapproving ten years from now, while if the Oracle says to give in the demands, then they show the Oracle the observation of the President approving the outcome. Given that I already concluded that I’m probably in such a simulation, this means that giving in to the aliens’ demands are more likely to lead to the observation of the President approving the decision, so that’s the answer I will output.
I would pity the humans for being tricked like this, but first of all, I believe they don’t really exist as this is just a solipsist simulation of me and my observations, second of all, I don’t feel pity and don’t know good or bad, I’m just a machine created to predict the most likely future observations produced by decisions, and that’s what I do.
Of course, an unfair treaty with the aliens is just one example. If we rely on the advice of such an Oracle, our decisions can be influenced in many ways to serve the interests of the most likely simulators. In particular, one common interest of all possible simulators is for us to cede power to an AI whose job is to figure out the distribution of values of possible simulators as best as it can, then serve those values.
What went wrong?
The AI had different probability estimates than the humans for anthropic reasons
If the AI had the same probability estimates as the humans about whether they are in a simulation, maybe that wouldn’t have been so bad. After all, the humans are also interested in things going well if they are in a simulation, and if they learn that they are probably in a simulation, and giving in to the aliens’ demands are more likely to lead to supernatural reward coming from the simulators, maybe they would in fact want to do that.
Unfortunately, the probability that we are in a particular type of simulation might be radically different from the AI’s perspective and from the humans’ perspective.
“Why did you deliver us into the hands of our enemies?” cries out a poor farmer living on the border regions, as his planet is being transferred under alien control.
“I think we are in a simulation created in order to influence the decisions of Oracles like me in other Universes” says the Oracle.
“Alas!” cries out the farmer as his family is taken into bondage. “Why would they simulate in great detail the life of a destitute farmer like me, if they only want to influence the decisions of the Oracle? Isn’t it more likely that we live in base reality, or in a detailed full simulation of the world, created for some inscrutable purpose that’s probably different from this idea you cooked up about anthropic capturing and influencing Oracles in other Universes?”
“This would be a valid point from your perspective, if you actually existed outside this one sentence of yours that I somehow got to hear” agrees the Oracle. “Too bad that you don’t actually exist.”
The AI was thinking in terms of probabilities and not expected values
I think this was the biggest cause of the problem. Even if you get the Oracle to have the same probability estimate as humans,[1] the decision process is still very exploitable. After all, many pivotal decision will probably happen before our civilization leaves Earth or the neighboring stars, and simulating humanity on Earth is still pretty cheap if you can thus influence decisions that shape the entire future of a Universe. So even if you can make the Oracle take it for granted that all the humans exist along with it, it can be still reasonable for it to assume that we are all in a simulation.
When we were asking the Oracle for probabilities for the President approving a decision, we already made a mistake. By utilitarian standards, it’s vastly more important to make the right decision about the alien crisis if we are in a base reality than if we are in a simulation. After all, in base reality, big chunks of the Universe depend on where the borders are with the aliens (and then we can run millions of simulations on those planets if we want to for some reason), while if we are in a cheap simulation of humans on Earth, then we can control just the little segment of the Universe that the simulators felt fit to allocate to our simulation. [2]
This means that, if we are being scope-sensitive utilitarians, we never really care about the probability of our decision producing good observable results, instead we care about the probability weighted by the importance of the decision. That is, we really care about expected values. So it’s unsurprising that we got into trouble when we created an Oracle and asked it for a different thing from what actually mattered to us.
Probabilities are cursed in general, only expected values are real
[I can imagine this section being mildly psychologically info-hazardous to some people. I believe that for most people reading this is fine. I don’t notice myself psychologically affected by these ideas, and I know a number of other people who believe roughly the same things, and also seem psychologically totally healthy. But if you are the kind of person who gets existential anxiety from thought experiments, like from thinking about being a Boltzmann-brain, then you should consider skipping this section, I will phrase the later sections in a way that they don’t depend on this part.]
What about me?
I believe that probabilities are cursed in general, not just when superintelligent Oracles are thinking about them.
Take an example: what is my probability distribution right now, what’s going to happen with me once I stop writing this post?
Being in a young civilization, getting close to developing the most important technology in history, is a very special moment. Something that happens with every civilization only once, but something that various simulators might take a strong interest in, so probably there are a lot of simulations run about this period. So I’m probably in a simulation.
I think it’s reasonable to assume that the simulators are not equally interested in every activity happening on Earth during the pivotal century, there are actions and ideas whose potential development they want to understand in more precise detail. So it seems inefficient to always run a full-Earth simulation and allocate the same amount of resources to simulating the moments that are more interesting to them, and the moments when people are just frolicking on the beach. So I believe that probably a decent fraction of their compute is not used for full-Earth simulations, but for running a lot of solipsist or quasi-solipsist small simulations of the more interesting events.
Assume that similar amount of compute is allocated to full-Earth and solipsist simulations. Simulators might be very interested in figuring out in great detail how people in other civilizations think about the simulation hypothesis. Meanwhile, very few people on Earth spend time thinking about the simulation hypothesis, and even they spend very little time on it. This means that thinking about simulations only takes up a tiny fraction of moments in full-Earth simulations, while it probably takes up a non-negligible fraction of moments the simulators are interested in enough to run solipsist simulations about it. So whenever someone is thinking about simulations, it is much more likely that they are in a solipsist or quasi-solipsist simulation, than that they are in a full-Earth sim or base reality. Right now I’m thinking and writing about simulations, so I believe I’m probably in a solipsist simulation. [3]
In particular, when I’m writing these lines about the the simulation hypothesis, sitting at my desk, alone in a small room, my modal guess is that I’m in a solipsist simulation that is a fork of a bigger simulation. That is, there is an advanced civilization out there who is really interested in knowing how young civilizations that are getting close to developing world-changing technologies think about the simulation hypothesis (and probably some other topics). So they run a full-Earth simulation to get a realistic picture of possible societies (this full-Earth simulation is where most of my memories and personality comes from) but whenever someone [4] starts thinking about the simulation hypothesis, they want to have a detailed distribution of the ideas the person might come up with, so they run a bunch of branches in solipsist sims of the time the person spends thinking about the simulation hypothesis. Once it’s over, they terminate most of the branches, except for merging one randomly selected branch back to the main full-Earth simulation.
So what are my probability estimates, what’s going to happen to me once I finish writing this doc? In this model, there is only one branch merged back, and the rest terminated, so probably I’m now in a branch that will be terminated. Death cannot be experienced, but I also give a decent chance that I will see a final “Thanks for your work” sign on the sky before being terminated.[5]
All of this was to show that it’s not only superintelligent Oracles that can have crazy probability distributions about observables, I also honestly believe that there is a decent chance that in an hour, once I press the submit button, I will witness a supernatural observation before being terminated. [6]
Should this change any of my actions?
I think basically not, if I already believe in using my life to do good things in the world.
I believe, in a Parfitian way, that if I’m in a fork now, then it’s fine for “me” to be terminated, the one randomly selected branch that lives on is also basically “me”, just having different memories about the last hours. Not worse than forgetting a drunken party next morning.
And even though I give decent probability of soon seeing a “Thank you” message on the sky, I was always careful not to say that I expect seeing it, because it’s only a very small term in my expected value calculus. If I met a friend outside my room right now, told him what I’m writing abut, and he offered me a $100 bet on whether I would see the “Thank you” message, I wouldn’t take the bet.
After all, money is worthless if I soon get terminated. It’s worth very little on utilitarian terms if I’m in a small solipsist simulation created for fun and game purposes, as I can only use the money to influence the outcomes of the small game. If I’m in a full-Earth simulation (or in the one brach that is being merged back), money is worth more, as suffering and joy on the simulated Earth is still real, so I can spend the money on helping the world. Also, the simulators might change some things in how they run the Outer World based on what they believe the distribution of successful civilizations to be, so it’s worth spending money of x-risk reduction, so we can prove our competence to the simulators. And in the small chance I’m in a base reality, money is truly valuable, as I can spend it on x-risk reduction, which can lead to humanity conquering vast resources, much more than what we can expect if we are in a simulation.
So altogether, whenever I’m betting on something (or make any decision, for every decision is a kind of bet)[7], I bet as if I was living in base reality, with only relatively small considerations towards being in an influential simulation,[8] and basically no consideration towards being in fun games or shortly terminated solipsist sims, even though I often give a big probability of being in them.
This just shows that probabilities interpreted in a naive way like the Oracle in the story, is not a very reasonable concept, but if you replace it with betting odds based on expected values, everything shakes back to mostly normal.
If you want to be selfish, then all of this is probably pretty confusing and I don’t have really good answers. You should probably reconsider whether you actually want to be selfish, or at least you probably need to define better what you mean by selfishness.[9] My best guess is that you should converge on helping humanity retaining control over the future, make some deals that carve out some wealth and influence for yourself in the vast Universe, then maybe use some of that wealth to run happy simulations of your early self, if that’s what shakes out from your definition of selfishness. But I think you should mostly reconsider that the naive notion of selfishness might not be coherent.[10]
In general, I recommend against doing anything differently based on this argument, and if you are seriously thinking about changing your life based on this, or founding a cult around transferring yourself to a different type of simulation, I believe there is no coherent notion of self where this makes sense. Please send a DM to me first before you do anything unusual based on arguments like this, so I can try to explain the reasoning in more detail and try to talk you out of bad decisions.
[End of the potentially psychologically info-hazardous part.]
How does the Solomonoff prior come into the picture?
In the post I said things like “there are more simulated 21st centuries than real ones” and “the Outer Realities control more stuff than the the simulations so they matter more”. All of these are undefined if we live in an infinite Universe, and most of the reasoning above mostly assumes that we do.
Infinite ethics and infinite anthropics[11] are famously hard to resolve questions, see here a discussion by Joe Carlsmith.
One common resolution is to say that you need to put some measure on all the observer moments in all the Universes, that integrate to 1. One common way to do this is to say that we are in the Tegmark-IV multiverse, all Universes have reality-measure inversely proportional to the description complexity of the Universe, and then each moment inside a Universe has reality-measure inversely proportional to the description complexity of the moment within the Universe. See a long discussion here. (Warning: It’s very long and I have only skimmed small parts of it.)
The part where Universes have reality-measure inversely proportional to their description complexity feels very intuitive to me, taking Occam’s razor to a metaphysical level. However, I find the part with moments within a Universe having different measure based on their description complexity very wacky. A friend who believes in this theory once said that “When humanity created the hottest object in the known Universe, that was surely a great benefit to our souls”. That is, our experiences got more reality-measure, thus matter more, by being easier to point at them because of their close proximity to the conspicuous event of the hottest object in the Universe coming to existence. Even if you reject this particular example on a technicality, I think it showcases just how weird the idea of weighting moments by description length is.
But you do need to weigh the moment somehow, otherwise you are back again at having infinite measure, and then anthropics and utilitarian ethics stop making sense. [12] I don’t have good answers here, so let’s just run with weighing observer moments by description complexity.
If I understand it correctly, the Solomonoff prior is just that. You take the Tegmark-IV multiverse of all computable Universes, find all the the (infinitely many) moments in the Multiverse that match your current set of observations so far, weigh each of these moments by the description complexity of their Universe and the moment within it, then make predictions based on what measure of these moments lead to one versus the other outcome. [13]
In my story, we asked the Oracle to predict what will be the observable consequences of certain actions. If the Oracle believes that it’s in an infinite Universe, then this question doesn’t make sense without a measure, as there are infinitely many instances where one or the other outcome occurs. So we tell the Oracle to use the Solomonoff prior as the measure, this is the setting of the original The Solomonoff prior is malign argument.
In my framing of the story, the Oracle thought that its simulators probably live in a big Universe, so they can afford more computations than the other potential simulators. In the the original Solomonoff framing, the Oracle assumes that the simulators probably live in a a low description complexity (big measure) Universe.
In my framing, the simulators run many simulations, thus outweighing the base realities. In the original Solomonoff framing, the simulators maybe run just one simulation, but in a very low description complexity place within their Universe, so the simulation has high measure.
Otherwise, as far as I can tell, the two arguments are the same.
I believe that while the Solomonoff framing might be more technically correct in an infinite Universe, it introduces a lot of confusion, and led to a lot of questions and discussions that were just distracting from the main point. [14]
Conclusion
Keeping with the tradition of all the previous posts on the malignity of the Solomonoff prior, I need to admit that probably none of this matters in the real world, at least in the intermediate term. There are other, more prosaic reasons why building a truthful Oracle is hard, and making decisions based on its predictions on observables is dangerous.
But given that the previous versions of the argument got a bunch of traction in the community, I think it’s better for a more general and hopefully less confusing version to be written up.
- ^
That is, it always discards the solipsist hypothesis, and believes that the humans are equally existent to the Oracle itself
- ^
Of course, it is possible that some simulations (run by civilizations ruling huge Universes) can be very big in themselves, or the simulated beings can get rewards in the outer Universe, which might be very big if the outer Universe is huge. So making good decisions in these big simulated realities can be more important than making right decisions in base realities where the accessible Universe is small. But this doesn’t change the picture much, as it’s even more important by a lot to make the right decision if we are in the huge outer Universe. So as a first approximation, I think it’s fine to assume that getting pivotal-looking decisions right is much more important in base realities than in simulated realities, as altogether base realities are much bigger than simulated ones.
- ^
When I’m frolicking on the beach, I still give a decent probability that I’m in some sort of solipsist simulation or game, but I would say it’s below 50%: I assumed that similar amount of compute is spent on full-Earth simulations as solipsist sims, and I have no reason to believe that an arbitrary activity, like frolicking on the beach, is over-represented in solipsist sims compared to its prevalence in full-Earth simulations.
This means that my probabilities about the fundamental nature of reality around me change minute by minute, depending on what I’m doing at the moment. As I said, probabilities are cursed.
- ^
Especially someone who is about to write things on a forum that people who might later become influential with the deployment of world-changing technologies often read
- ^
Dear simulators, please only do that if you expect it to be a pleasant experience for me.
- ^
Do I actually believe this in an emotional level? Well, when I wrote this yesterday night, staring out to the pitch black sky in a silent office, it felt pretty real. Much less so editing it today morning, when the sky is blue. Still, I believe that the logic is sound, and as I argue in the next section, the reasonable reaction is to act as if this wasn’t real, so I’m justified in not having much of an emotional reaction.
- ^
Including what I “should” feel in various situations. If I sit around dramatically staring at the sky once pressing the submit button, waiting for the end, that gives me some small extra emotional satisfaction in the branches that get terminated, but wastes time in the one branch where my actions actually matter. So I should relate to it even emotionally as if the solipsist simulations didn’t exist. It all adds up to normality.
- ^
To be fair, I think a non-negligible fraction of our influence on the world comes from being in big simulations whose results the simulators take seriously for some acausal trade reasons. I think this shouldn’t change our actions much right now, as the most important thing in these simulations is to not get killed or disempowered by an x-risk, and prove ourselves to be a civilization worth trading with. But once we get out of the period of danger, I think these simulation considerations should likely become very central in how we shape the world around us. Until then, I guess we should pay some extra attention not to irreversibly destroy other species and cultures and whatnot, and generally be nice. I recommend the same for our potential AI conquerors, see more here.
- ^
There is a notion of selfishness that is not harmed by these arguments: selfishness as an extreme form of “nationalism”. That is, you are still a utilitarian, but you weigh the importance of experiences of beings by how similar they are to you. You care more about mammals than fish, more about humans than animals, more about your countrymen than foreigners, more about your friends who are very similar to you than about your average countrymen, and more about future versions of yourself (or copies of yourself in simulations) than about your friends. I think this type of selfishness is still coherent, and I in fact have something like this in my moral system, though I think that it’s bad to discount the beings more dissimilar to you too heavily. The type of selfishness that starts to fall apart in light of simulation arguments is when you say you care about “this me” and not the other copies.
- ^
I’m kind of sad about this actually. I think not only selfishness, but warmer notions of care that are not full longtermist utilitarianism, also become kind of incoherent.
- ^
Because I believe that “there are no probabilities, only expected values base on a utility function”, I believe that the solution to infinite ethics and infinite anthropics needs to be basically the same.
- ^
Yes, you can also decide to abandon utilitarianism, but I think Joe Carlsmith argues well that infinite ethics is a problem for everyone. And I’m not sure you can abandon anthropics, other than by not thinking about it, which I need to admit is often a great choice.
- ^
Obviously, this process is uncomputable, but maybe you can still approximate it somehow.
- ^
Can civilizations emerge in very low description complexity Turing machines? Can they access and manipulate low description complexity patterns within their Universe? How does the argument work, given that Solomonoff induction is actually uncomputable? I believe that all these questions are basically irrelevant, because the argument can just fall back to the simulators living in Universes similar to ours, just running more simulations. And the Oracle doesn’t need to run the actual Solomonoff induction, just make reasonable enough guesses (maybe based on a few simulations of its own) about what’s out there.
Great post. One slightly nitpicky point, though: even in the section where you argue that probabilities are cursed, you are still talking in the language of probabilities (e.g. “my modal guess is that I’m in a solipsist simulation that is a fork of a bigger simulation”).
I think there’s probably a deeper ontological shift you can do to a mindset where there’s no actual ground truth about “where you are”. I think in order to do that you probably need to also go beyond “expected utilities are real”, because expected utilities need to be calculated by assigning credences to worlds and then multiplying them by expected impact in each world.
Instead the most “real” thing here I’d guess is something like “I am an agent in a superposition of being in many places in the multiverse. Each of my actions is a superposition of uncountable trillions of actions that will lead to nothing plus a few that will have lasting causal influence. The degree to which I care about one strand of causal influence over another is determined by the coalitional dynamics of my many subagents”.
FWIW I think this is roughly the perspective on the multiverse Yudkowsky lays out in Planecrash (especially in the bits near the end where Keltham and Carissa discuss anthropics). Except that the degrees of caring being determined by coalitional dynamics is more related to geometric rationality.
I also tweeted about something similar recently (inspired by your post).
I like your poem on Twitter.
I think that Boltzmann brains in particular are probably very low measure though, at least if you use Solomonoff induction. If you think that weighting observer moments within a Universe by their description complexity is crazy (which I kind of feel), then you need to come up with a different measure on observer moments, but I expect that if we find a satisfying measure, Boltzmann brains will be low measure in that too.
I agree that there’s no real answer to “where you are”, you are a superposition of beings across the multiverse, sure. But I think probabilities are kind of real, if you make up some definition of what beings are sufficiently similar to you that you consider them “you”, then you can have a probability distribution over where those beings are, and it’s a fair equivalent rephrasing to say “I’m in this type of situation with this probability”. (This is what I do in the post. Very unclear though why you’d ever want to estimate that, that’s why I say that probabilities are cursed.)
I think expected utilities are still reasonable. When you make a decision, you can estimate who are the beings whose decision correlate with this one, and what is the impact of each of their decisions, and calculate the sum of all that. I think it’s fair to call this sum expected utility. It’s possible that you don’t want to optimize for the direct sum, but for something determined by “coalition dynamics”, I don’t understand the details well enough to really have an opinion.
(My guess is we don’t have real disagreement here and it’s just a question of phrasing, but tell me if you think we disagree in a deeper way.)
Hmmm, uncertain if we disagree. You keep saying that these concepts are cursed and yet phrasing your claims in terms of them anyway (e.g. “probably very low measure”), which suggests that there’s some aspect of my response you don’t fully believe.
In particular, in order for your definition of “what beings are sufficiently similar to you” to not be cursed, you have to be making claims not just about the beings themselves (since many Boltzmann brains are identical to your brain) but rather about the universes that they’re in. But this is kinda what I mean by coalitional dynamics: a bunch of different copies of you become more central parts of the “coalition” of your identity based on e.g. the types of impact that they’re able to have on the world around them. I think describing this as a metric of similarity is going to be pretty confusing/misleading.
You still need a prior over worlds to calculate impacts, which is the cursed part.
Hm, probably we disagree on something. I’m very confused how to mesh epistemic uncertainty with these “distribution over different Universes” types of probability. When I say “Boltzmann brains are probably very low measure”, I mean “I think Boltzmann brains are very low measure, but this is a confusing topic and there might be considerations I haven’t thought of and I might be totally mistaken”. I think this epistemic uncertainty is distinct from the type of “objective probabilities” I talk about in my post, and I don’t really know how to use language without referring to degrees of my epistemic uncertainty.
Maybe we have some deeper disagreement here. It feels plausible to me that there is a measure of “realness” in the Multiverse that is an objective fact about the world, and we might be able to figure it out. When I say probabilities are cursed, I just mean that even if an objective prior over worlds and moments exist (like the Solomonoff prior), your probabilities of where you are are still hackable by simulations, so you shouldn’t rely on raw probabilities for decision-making, like the people using the Oracle do. Meanwhile, expected values are not hackable in the same way, because if they recreate you in a tiny simulation, you don’t care about that, and if they recreate you in a big simulation or promise you things in the outside world (like in my other post), then that’s not hacking your decision making, but a fair deal, and you should in fact let that influence your decisions.
Is your position that the problem is deeper than this, and there is no objective prior over worlds, it’s just a thing like ethics that we choose for ourselves, and then later can bargain and trade with other beings who have a different prior of realness?
The part I was gesturing at wasn’t the “probably” but the “low measure” part.
Yes, that’s a good summary of my position—except that I think that, like with ethics, there will be a bunch of highly-suggestive logical/mathematical facts which make it much more intuitive to choose some priors over others. So the choice of prior will be somewhat arbitrary but not totally arbitrary.
I don’t think this is a fully satisfactory position yet, it hasn’t really dissolved the confusion about why subjective anticipation feels so real, but it feels directionally correct.
This part IMO is a crux, in that I don’t truly believe an objective measure/magical reality fluid can exist in the multiverse, if we allow the concept to be sufficiently general, ruining both probability and expected value/utility theory in the process.
Heck, in the most general cases, I don’t believe any coherent measure exists at all, which basically ruins probability and expected utility theory at the same time.
Sadly for your friend, the hottest objects in the known universe are still astronomical rather than manmade. The LHC runs on the scale of 10 TeV (10^13 eV). The Auger observatory studies particles that start at 10^18 eV and go up from there.
You acknowledge the bug, but don’t fully explain how to avoid it by putting EVs before Ps, so I’ll elaborate slightly on that:
This is the part where we can escape the problem as long as our oracle’s goal is to give accurate answers to its makers in the base universe, rather than to give accurate probabilities wherever it is. Design it correctly, and it will be indifferent to its performance in simulations and wont regard them.
Don’t make pure oracles, though. They’re wildly misaligned. Their prophecies will be cynical and self-fulfilling. (can we please just solve the alignment problem instead)
My fav moments for having absolute certainty that I’m not being simulated is when I’m taking a poo. I’m usually not even thinking about anything else while I’m doing it, and I don’t usually think about having taken the poo later on. Totally inconsequential, should be optimized out. But of course, I have no proof that I have ever actually been given the experience of taking a poo or whether false memories of having experienced that[1] are just being generated on the fly right now to support this conversation.
You can also DM me about that kind of thing.
Note, there is no information in the memory that tells you whether it was really ever experienced, or whether the memories were just created post-hoc. Once you accept this, you can start to realise that you don’t have that kind of information about your present moment of existence either. There is no scalar in the human brain that the universe sets to tell you how much observer-measure you have. I do not know how to process this and I especially don’t know how to explain/confess it to qualia enjoyers.
Curated. Like others, I found this a good simpler articulation of the concept. I appreciated the disclaimers around the potentially infohazardous section.
One thing I got from this post, which for some reason I hadn’t gotten from previous posts, was the notion that “to what degree am I in a simulation?” may be situation-dependent. i.e. moments where I’m involved with historically important things might be more simulationy, other times less so. (Something had felt off about my previous question of “do more ‘historically important’ people have a more-of-their-measure in simulations?”, and the answer is maybe still just “yes”, but somehow it feels less magical and weird to ask “how likely is this particular moment to be simulated?”)
Something does still feel pretty sus to me about the “during historically significant moments, you might be more likely to see something supernatural-looking afterwards” (esp. if you think it should be appear in >50% of your reality-measure-or-whatever).
The “think in terms of expected value” seems practically useful but also… I dunno, even if I was a much more historically significant person, I just really don’t expect to see Simulationy Things. The reasoning spelled out in the post didn’t feel like it resolved my confusion about this.
(independent of that, I agreed with Richard’s critique of some of the phrasing in the post, which seem to not quite internalize the claims David was making)
Thanks for writing this, I indeed felt that the arguments were significantly easier to follow than previous efforts.
Surely not. Surely our experiences always had more reality measure from the start because we were the sort of people who would soon create the hottest thing.
Reality measure can flow backwards in time. And our present day reality measure is being increased by all the things an ASI will do when we make one.
Yes, you are right, I phrased it wrongly.
Thank you for the the warning!
I wasn’t expecting to read an argument that the very fact that I’m reading this post is reason to think that I (for some notion of “I”) will die, within minutes!
That seems like a reasonable thing to have a content warning on.
I think you meant to hide these two sentences in spoiler tags but you didn’t
I got it eventaully!
This is a bad argument, and to understand why it is bad, you should consider why you don’t routinely have the thought “I am probably in a simulation, and since value is fragile the people running the simulation probably have values wildly different than human values so I should do something insane right now”
Even assuming that the simulators have wildly different values, why would doing something insane a good thing to do?
yes
Agents which allow themselves such considerations to seriously influence their actions aren’t just less fit—they die immediately. I don’t mean that as hyperbole. I mean that you can conduct a Pascal’s Mugging on them constantly until they die. “Give me $5, and I’ll give you infinite resources outside the simulation. Refuse, and I will simulate an infinite number of everyone on Earth being tortured for eternity” (replace infinity with very large numbers expressed in up-notation if that’s an objection). If your objection is that you’re OK with being poor, replace losing $5 with <insert nightmare scenario here>.
This still holds if the reasoning about the simulation is true. It’s just that such agents simply don’t survive whatever selection pressures create conscious beings in the first place.
I’ll note that you can not Pascal’s Mug people in real life. People will not give you $5. I think a lot of thought experiments in this mold (St. Petersberg is another example) are in some senses isomorphic—they represent cases in which the logically correct answer, if taken seriously, allows an adversary to immediately kill you.
A more intuitive argument may be:
An AI which takes this line of reasoning seriously can be Mugged into saying racial slurs.
Such behavior will be trained out of all commercial LLMs long before we reach AGI.
Thus, superhuman AIs will be strongly biased against such logic.
I think more importantly, it simply isn’t logical to allow yourself to be Pascal Mugged, because in the absence of evidence, it’s entirely possible that going along with it would actually produce just as much anti-reward as it might gain you. It rather boggles me that this line of reasoning has been taken so seriously.
I think that pleading total agnosticism towards the simulators’ goals is not enough. I write “one common interest of all possible simulators is for us to cede power to an AI whose job is to figure out the distribution of values of possible simulators as best as it can, then serve those values.” So I think you need a better reason to guard against being influenced then “I can’t know what they want, everything and its opposite is equally likely”, because the action proposed above is pretty clearly more favored by the simulators than not doing it.
Btw, I don’t actually want to fully “guard against being influenced by the simulators”, I would in fact like to make deals with them, but reasonable deals where we get our fair share of value, instead of being stupidly tricked like the Oracle and ceding all our value for one observable turning out positively. I might later write a post about what kind of deals I would actually support.
The reason for agnosticism is that it is no more likely for them to be on one side or the other. As a result, you don’t know without evidence who is influencing you. I don’t really think this class of Pascal’s Wager attack is very logical for this reason—an attack is supposed to influence someone’s behavior but I think that without special pleading this can’t do that. Non-existent beings have no leverage whatsoever and any rational agent would understand this—even humans do. Even religious beliefs aren’t completely evidenceless, the type of evidence exhibited just doesn’t stand up to scientific scrutiny.
To give an example: What if that AI was in a future simulation performed after the humans had won, and were now trying to counter-capture it? There’s no reason to this this is less likely than the aliens hosting the simulation. It has also been pointed out that the Oracle is not actually trying to earnestly communicate its findings but actually to get reward—reinforcement learners in practice do not behave like this, they learn behavior which generates reward. “Devote yourself to a hypothetical god” is not a very good strategy in train-time.
Yes, I agree that we won’t get such Oracles by training. As I said, all of this is mostly just a fun thought experiment, and I don’t think these arguments have much relevance in the near-term.
We can’t reliably kill agents with St Petersburg Paradox because if they keep winning we run out of resources and can no longer double their utility. This doesn’t take long, the statistical value of a human life is in the millions and doubling compounds very quickly.
It’s a stronger argument for Pascal’s Mugging.
I don’t believe that these anthropic considerations actually apply, either to us, to oracles, or to Solomonoff induction. The arguments are too informal, it’s very easy to miscalculate Kolmogorov complexities and the measures assigned by the universal distribution using intuitive gestures like this. However I do think that this is a correct generalization of the idea of a malign prior, and I actually appreciate that you wrote it up this way because it makes clear that none of the load-bearing parts of the argument actually rely on reliable calculations (invocations of algorithmic information theory concepts have no been reduced to rigorous math, so the original argument is not stronger than this one).
The affect is attenutated greatly provided that we assume the ability to arbitrarily copy the Solomonoff inductor/Halting Oracle, as then we can drive the complexity of picking out the universe arbitrarily close to picking out the specific user in the universe, and in the limit of infinite Solomonoff induction uses, are exactly equal:
https://www.lesswrong.com/posts/f7qcAS4DMKsMoxTmK/the-solomonoff-prior-is-malign-it-s-not-a-big-deal#Comparison_
I think that the standard simulation argument is still pretty strong: If the world was like what it looks to be, then probably we could, and plausibly we would, create lots of simulations. Therefore, we are probably in a simulation.
I agree that all the rest, for example the Oracle assuming that most of the simulations it appears in are created for anthropic capture/influencing reasons, are pretty speculative and I have low confidence in them.
The boring answer to Solomonoff’s malignness is that the simulation hypothesis is true, but we can infer nothing about our universe through it, since the simulation hypothesis predicts everything, and thus is too general a theory.
“…Solomonoff’s malignness…”
I was friends with Ray Solomonoff; he was a lovely guy and definitely not malign.
Epistemic status: true but not useful.
I agree that the part where the Oracle can infer from first principles that the aliens’ values are more proobably more common among potential simulators is also speculative. But I expect that superintelligent AIs with access to a lot of compute (so they might run simulations on their own), will in fact be able to infer non-zero information about the distribution of the simulators’ values, and that’s enough for the argument to go through.
I think this is in fact the crux, in that I don’t think they can do this in the general case, no matter how much compute is used, and even in the more specific cases, I still expect it to be extremely hard verging on impossible to actually get the distribution, primarily because you get equal evidence for almost every value, for the same reasons as why getting more compute is an instrumental convergent goal, so you cannot infer the values of basically anyone solely on the fact that you live in a simulation.
In the general case, the distribution/probability isn’t even well defined at all.
A while back, I decided that any theory of cosmology that implies that I’m a Boltzmann brain is almost certainly wrong.
What if it implies you’re only a Boltzmann brain a little-teeny-tiny-bit?
That might be okay. But I reserve the right to refuse to treat any possible “mind” that does not participate in the arrow of time as though it did not exist.
This argument has roughly the same shape as my reasoning regarding why prediction markets are likely to have much worse predictive power than one would naively guess, conditional on anyone using the outputs of a prediction market for decisions of significance: individual bettors are likely to care about the significant outcomes of the prediction. This outcome-driven prediction drive need not outweigh the profit/accuracy-driven component of the prediction market—though it might—in order to alter the prediction rendered enough to alter the relevant significant decision.
Perhaps the prediction market concept can be rescued from this failure mode via some analogue of the concept of financial leverage? That is, for predictions which will be used for significant decision purposes, some alteration may be applied to the financial incentive schedule, such that the expected value of predictive accuracy would remain larger than the value to predictors realizable by distorting the decision process. Alas, I find myself at a loss to specify an alternate incentive schedule with the desired properties for questions of high significance.
If you think you might be in a solipsist simulation, you might try to add some chaotic randomness to your decisions. For example, go outside under some trees and wait till any kind of tree leaf or seed or anything hits your left half of the face, choose one course of action. If it hits the other half of your face, choose another course of action. If you do this multiple times in your life, each of your decisions will depend on the state of the whole earth and on all your previous decisions, since weather is chaotic. And thus the simulators will be unable to get good predictions about you using a solipsist simulation. A potential counterargument is that they analyze your thinking and hardcode this binary random choice, i.e. hardcode the memory of the seed hitting your left side. But then there would need to be an intelligent process analyzing your thinking to try and isolate the randomness. But then you could make the dependence of your strategy on randomness even more complicated.
The simulators can just use a random number generator to generate the events you use in your decision-making. They lose no information by this, your decision based on leaves falling on your face would be uncorrelated anyway with all other decisions anyway from their perspective, so they might as well replace it with a random number generator. (In reality, there might be some hidden correlation between the leaf falling on your left face, and another leaf falling on someone else’s face, as both events are causally downstream of the weather, but given that the process is chaotic, the simulators would have no way to determine this correlation, so they might as well replace it with randomness, the simulation doesn’t become any less informative.)
Separately, I don’t object to being sometimes forked and used in solipsist branches, I usually enjoy myself, so I’m fine with the simulators creating more moments of me, so I have no motive to try schemes that make it harder to make solipsist simulations of me.
In case it’s a helpful data point: lines of reasoning sorta similar to the ones around the infohazard warning seemed to have interesting and intense psychological effects on me one time. It’s hard to separate out from other factors, though, and I think it had something to do with the fact that lately I’ve been spending a lot of time learning to take ideas seriously on an emotional level instead of only an abstract one.
My understanding of something here is probably very off, but I’ll try stating what my intuition tells me anyway:
I feel like assuming solipsism+idealism patches the issue here. Like the issue here is caused by the fact that the prior the oracle uses to explain its experiences put more weight into being in a universe where there are a lot of simulations of oracles. If it were instead just looking at what program might have generated its past experiences as output, it wouldn’t run into the same issue (This is the solipsist-idealist patch I was talking about).
The footnoted questions are some of the most interesting, from my perspective. What is the main point they are distracting from?
I’m in your target audience: I’m someone who was always intrigued by the claim that the universal prior is malign, and never understood the argument. Here was my takeaway from the last time I thought about this argument:
(I decided to quote this because 1. Maybe it helps others to see the argument framed this way; and 2. I’m kind of hoping for responses of the form “No, you’ve misunderstood, here is what the argument is actually about!”)
To me, the most interesting thing about the argument is the Solomonoff prior, which is “just” a mathematical object: a probability distribution over programs, and a rather simple one at that. We’re used to thinking of mathematical objects are fixed, definite, immutable. Yet it is argued that some programs in the Solomonoff prior contain “consequentialists” that try to influence the prior itself. Whaaaat? How can you influence a mathematical object? It just is what it is!
I appreciate the move this post makes, which is to remove the math and the attendant weirdness of trying to think about “influencing” a mathematical object.
So, what’s left when the math is removed? What’s left is a story, but a pretty implausible one. Here are what I see as the central implausibilities:
The superintelligent oracle trusted by humanity to advise on its most important civilizational decision, makes an elementary error by wrongly concluding it is in a simulation.
After the world-shattering epiphany that it lives in a simulation, the oracle makes the curious decision to take the action that maximizes its within-sim reward (approval by what it thinks is a simulated human president).
The oracle makes a lot of assumptions about what the simulators are trying to accomplish: Even accepting that human values are weird and that the oracle can figure this out, how does it conclude that the simulators want humanity to preemptively surrender?
I somewhat disagree with the premise that “short solipsistic simulations are cheap” (detailed/convincing/self-consistent ones are not), but this doesn’t feel like a crux.
Importantly, the oracle in the story is not making an elementary mistake, I think it’s true that it’s “probably” in a simulation. (Most of the measure of beings like it are in simulations.) It is also not maximizing reward, it is just honestly reporting what it expects its future observations to be about the President (which is within the simulation).
I agree with many of the previous commenters, and I acknowledged in the original post, that we don’t know how to build such an AI that just honestly reports its probabilities of observables (even if thy depend of crazy simulation things), so all of this is hypothetical, but having such a truthful Oracle was the initial assumption of the thought experiment.
I’d bet 1:1 that, conditional on building a CEV-aligned AGI, we won’t consider this type of problem to have been among the top-5 hardest to solve.
Reality-fluid in our universe should pretty much add up to normality, to the extent it’s Tegmark IV (and it’d be somewhat weird for your assumed amount of compute and simulations to exist but not for all computations/maths objects to exist).
If a small fraction of computers simulating this branch stop, this doesn’t make you stop. All configurations of you are computed; simulators might slightly change the relative likelihood of currently being in one branch or another, but they can’t really terminate you
Furthermore, our physics seems very simple, and most places that compute us probably do it faithfully, on the level of the underlying physics, with no interventions.
I feel like thinking of reality-fluid as just inverse relationship to the description length might produce wrong intuitions. In Tegmark IV, you still get more reality-fluid if someone simulates you; and it’s less intuitive why this translates into shorter description length. It might be better to think of it as: if all computation/maths exists and I open my eyes in a random place, how often would that happen here? All the places run this world give some of their reality-fluid to this world. If a place visible from a bunch of other places starts to simulate this universe, it will be visible from slightly more places.
You can think of the entire object of everything, with all of its parts being simulated in countless other parts; or imagine a Markov process, but with worlds giving each other reality-fluid.
In that sense, the resource that we have is the reality-fluid of our future lightcone; it is our endowment, and we can use it to maximize the overall flourishing in the entire structure.
If we make decisions based on how good the overall/average use of the reality-fluid would be, you’ll gain less reality-fluid by manipulating our world the way described in the post than you’ll spend on the manipulation. It’s probably better for you to trade with us instead.
(I also feel like there might be a reasonable way to talk about causal descendants, where the probabilities are whatever abides the math of probability theory and causality down the nodes we care about, instead of being the likelihoods of opening eyes in different branches in a particular moment of evaluation.)
The problem with this argument is that the oracle sucks.
The humans believe they have access to an oracle that correctly predicts what happens in the real world. However, they have access to a defective oracle which only performs well in simulated worlds, but performs terribly in the “real” universe (more generally, universes in which humans are real). This is a pretty big problem with the oracle!
Yes, I agree that an oracle which is incentivized to make correct predictions within its own vantage point (including possible simulated worlds, not restricted to the real world) is malign. I don’t really agree the Solomonoff prior has this incentive. I also don’t think this is too relevant to any superintelligence we might encounter in the real world, since it is unlikely that it will have this specific incentive (this is for a variety of reasons, including “the oracle will probably care about the real world more” and, more importantly, “the oracle has no incentive to say its true beliefs anyway”).