justinpombrio answers Am I confused about the “malign universal prior” argument?

justinpombrio 28 Aug 2024 15:01 UTC
4 points
−4
Here’s a simple argument that simulating universes based on Turing machine number can give manipulated results.

Say we lived in a universe much like this one, except that:
- The universe is deterministic
- It’s simulated by a very short Turing machine
- It has a center, and
- That center is actually nearby! We can send a rocket to it.
So we send a rocket to the center of the universe and leave a plaque saying “the answer to all your questions is Spongebob”. Now any aliens in other universes that simulate our universe and ask “what’s in the center of that universe at time step 10^1000?” will see the plaque, search elsewhere in our universe for the reference, and watch Spongebob. We’ve managed to get aliens outside our universe to watch Spongebob.

I feel like it would be helpful to speak precisely about the universal prior. Here’s my understanding.

It’s a partial probability distribution over bit strings. It gives a non-zero probability to every bit string, but these probabilities add up to strictly less than 1. It’s defined as follows:

$P (bitstring) = Σ_{code} (TM(code) halts with bitstring) / 2^{len(code)}$

That is, describe Turing machines by a binary code, and assign each one a probability based on the length of its code, such that those probabilities add up to exactly 1. Then magically run all Turing machines “to completion”. For those that halt leaving a bitstring on their tape, attribute the probability of that Turing machine to that bitstring. Now we have a probability distribution over bitstrings, though the probabilities add up to less than one because not all of the Turing machines halted.

You cannot compute this probability distribution, but you can compute lower bounds on the probabilities of its bitstrings. (The Nth lower bound is the probability distribution you get from running the first N TMs for N steps.)

Call a TM that halts poisoned if its output is determined as follows:
- The TM simulates a complex universe full of intelligent life, then selects a tiny portion of that universe to output, erasing the rest.
- That intelligent life realizes this might happen, and writes messages in many places that could plausibly be selected.
- It works, and the TM’s output is determined by what the intelligent life it simulated chose to leave behind.
If we approximate the universal prior, the probability contribution of poisoned TMs will be precisely zero, because we don’t have nearly enough compute to simulate a poisoned TM until it halts. However, if there’s an outer universe with dramatically more compute available, and it’s approximating the universal prior using enough computational power to actually run the poisoned TMs, they’ll effect the probability distribution of the bitstrings, making bitstrings with the messages they choose to leave behind more likely.

So I think Paul’s right, actually (not what I expected when I started writing this). If you approximate the UP well enough, the distribution you see will have been manipulated.
- justinpombrio 28 Aug 2024 20:06 UTC
  1 point
  0
  Parent
  Very curious what part of this people think is wrong.
  - hairyfigment 29 Aug 2024 1:31 UTC
    4 points
    4
    Parent
    I don’t see how any of it can be right. Getting one algorithm to output Spongebob wouldn’t cause the SI to watch Spongebob -even a less silly claim in that vein would still be false. The Platonic agent would know the plan wouldn’t work, and thus wouldn’t do it.
    Since no individual Platonic agent could do anything meaningful alone, and they plainly can’t communicate with each other, they can only coordinate by means of reflective decision theory. That’s fine, we’ll just assume that’s the obvious way for intelligent minds to behave. But then the SI works the same way, and knows the Platonic agents will think that way, and per RDT it refuses to change its behavior based on attempts to game the system. So none of this ever happens in the first place.
    (This is without even considering the serious problems with assuming Platonic agents would share a goal to coordinate on. I don’t think I buy it. You can’t evolve a desire to come into existence, nor does an arbitrary goal seem to require it. Let me assure you, there can exist intelligent minds which don’t want worlds like ours to exist.)