PhilGoetz comments on The Solomonoff Prior is Malign

PhilGoetz 17 Oct 2020 3:14 UTC
22 points
“At its core, this is the main argument why the Solomonoff prior is malign: a lot of the programs will contain agents with preferences, these agents will seek to influence the Solomonoff prior, and they will be able to do so effectively.”
First, this is irrelevant to most applications of the Solomonoff prior. If I’m using it to check the randomness of my random number generator, I’m going to be looking at 64-bit strings, and probably very few intelligent-life-producing universe-simulators output just 64 bits, and it’s hard to imagine how an alien in a simulated universe would want to bias my RNG anyway.
The S. prior is a general-purpose prior which we can apply to any problem. The output string has no meaning except in a particular application and representation, so it seems senseless to try to influence the prior for a string when you don’t know how that string will be interpreted.
Can you give an instance of an application of the S. prior in which, if everything you wrote were correct, it would matter?
Second, it isn’t clear that this is a bug rather than a feature. Say I’m developing a program to compress photos. I’d like to be able to ask “what are the odds of seeing this image, ever, in any universe?” That would probably compress images of plants and animals better than other priors, because in lots of universes life will arise and evolve, and features like radial symmetry, bilateral symmetry, leafs, legs, etc., will arise in many universes. This biasing of priors by evolution doesn’t seem to me different than biasing of priors by intelligent agents; evolution is smarter than any agent we know. And I’d like to get biasing from intelligent agents, too; then my photo-compressor might compress images of wheels and rectilinear buildings better.
Also in the category of “it’s a feature, not a bug” is that, if you want your values to be right, and there’s a way of learning the values of agents in many possible universes, you ought to try to figure out what their values are, and update towards them. This argument implies that you can get that for free by using Solomonoff priors.
(If you don’t think your values can be “right”, but instead you just believe that your values morally oblige you to want other people to have those values, you’re not following your values, you’re following your theory about your values, and probably read too much LessWrong for your own good.)
Third, what do you mean by “the output” of a program that simulates a universe? How are we even supposed to notice the infinitesimal fraction of that universe’s output which the aliens are influencing to subvert us? Take your example of Life—is the output a raster scan of the 2D bit array left when the universe goes static? In that case, agents have little control over the terminal state of their universe (and also, in the case of Life, the string will be either almost entirely zeroes, or almost entirely 1s, and those both already have huge Solomonoff priors). Or is it the concatenation of all of the states it goes through, from start to finish? In that case, by the time intelligent agents evolve, their universe will have already produced more bits than our universe can ever read.
Are you imagining that bits are never output unless the accidentally-simulated aliens choose to output a bit? I can’t imagine any way that could happen, at least not if the universe is specified with a short instruction string.
This brings us to the 4th problem: It makes little sense to me to worry about averaging in outputs from even mere planetary simulations if your computer is just the size of a planet, because it won’t even have enough memory to read in a single output string from most such simulations.
5th, you can weigh each program’s output proportional to 2^-T, where T is the number of steps it takes the TM to terminate. You’ve got to do something like that anyway, because you can’t run TMs to completion one after another; you’ve got to do something like take a large random sample of TMs and iteratively run each one step. Problem solved.
Maybe I’m misunderstanding something basic, but I feel like we’re talking about many angels can dance on the head of a pin.
Perhaps the biggest problem is that you’re talking about an entire universe of intelligent agents conspiring to change the “output string” of the TM that they’re running in. This requires them to realize that they’re running in a simulation, and that the output string they’re trying to influence won’t even be looked at until they’re all dead and gone. That doesn’t seem to give them much motivation to devote their entire civilization to twiddling bits in their universe’s final output in order to shift our priors infinitesimally. And if it did, the more likely outcome would be an intergalactic war over what string to output.
(I understand your point about them trying to “write themselves into existence, allowing them to effectively “break into” our universe”, but as you’ve already required their TM specification to be very simple, this means the most they can do is cause some type of life that might evolve in their universe to break into our universe. This would be like humans on Earth devoting the next billion years to tricking God into re-creating slime molds after we’re dead. Whereas the things about themselves that intelligent life actually care about with and self-identify with are those things that distinguish them from their neighbors. Their values will be directed mainly towards opposing the values of other members of their species. None of those distinguishing traits can be implicit in the TM, and even if they could, they’d cancel each other out.)
Now, if they were able to encode a message to us in their output string, that might be more satisfying to them. Like, maybe, “FUCK YOU, GOD!”
- Mark Xu 17 Oct 2020 6:16 UTC
  22 points
  Parent
  
  The S. prior is a general-purpose prior which we can apply to any problem. The output string has no meaning except in a particular application and representation, so it seems senseless to try to influence the prior for a string when you don’t know how that string will be interpreted.
  
  The claim is that consequentalists in simulated universes will model decisions based on the Solomonoff prior, so they will know how that string will be interpreted.
  
  Can you give an instance of an application of the S. prior in which, if everything you wrote were correct, it would matter?
  
  Any decision that controls substantial resource allocation will do. For example, if we’re evaluting the impact of running various programs, blow up planets, interfere will alien life, etc.
  
  Also in the category of “it’s a feature, not a bug” is that, if you want your values to be right, and there’s a way of learning the values of agents in many possible universes, you ought to try to figure out what their values are, and update towards them. This argument implies that you can get that for free by using Solomonoff priors.
  
  If you are a moral realist, this does seem like a possible feature of the Solomonoff prior.
  
  Third, what do you mean by “the output” of a program that simulates a universe?
  
  A TM that simulates a universe must also specify an output channel.
  
  Take your example of Life—is the output a raster scan of the 2D bit array left when the universe goes static? In that case, agents have little control over the terminal state of their universe (and also, in the case of Life, the string will be either almost entirely zeroes, or almost entirely 1s, and those both already have huge Solomonoff priors). Or is it the concatenation of all of the states it goes through, from start to finish?
  
  All of the above. We are running all possible TMs, so all computable universes will be paired will all computable output channels. It’s just a question of complexity.
  
  Are you imagining that bits are never output unless the accidentally-simulated aliens choose to output a bit? I can’t imagine any way that could happen, at least not if the universe is specified with a short instruction string.
  
  No.
  
  This brings us to the 4th problem: It makes little sense to me to worry about averaging in outputs from even mere planetary simulations if your computer is just the size of a planet, because it won’t even have enough memory to read in a single output string from most such simulations.
  
  I agree that approximation the Solmonoff prior is difficult and thus its malignancy probably doesn’t matter in practice. I do think similar arguments apply to cases that do matter.
  
  5th, you can weigh each program’s output proportional to 2^-T, where T is the number of steps it takes the TM to terminate. You’ve got to do something like that anyway, because you can’t run TMs to completion one after another; you’ve got to do something like take a large random sample of TMs and iteratively run each one step. Problem solved.
  
  See the section on the Speed prior.
  
  Perhaps the biggest problem is that you’re talking about an entire universe of intelligent agents conspiring to change the “output string” of the TM that they’re running in. This requires them to realize that they’re running in a simulation, and that the output string they’re trying to influence won’t even be looked at until they’re all dead and gone. That doesn’t seem to give them much motivation to devote their entire civilization to twiddling bits in their universe’s final output in order to shift our priors infinitesimally. And if it did, the more likely outcome would be an intergalactic war over what string to output.
  
  They don’t have to realize they’re in a simulation, they just have to realize their universe is computable. Consequentialists care about their values after they’re dead. The cost of influncing the prior might not be that high because they only have to compute it once and the benefit might be enormous. Exponential decay + acausal trade make an intergalactic war unlikely.