paulfchristiano comments on Response to “What does the universal prior actually look like?”

paulfchristiano 21 May 2021 15:36 UTC
LW: 5 AF: 4
AF
I don’t think it’s just like saying that...
I didn’t quite get this, so let me try restating what I mean.
Let’s say the states and rules for manipulating the worktapes are totally fixed and known, and we’re just uncertain about the rules for outputting something to the output tape.
Zero of these correspond to reading off the bits from a camera (or dataset) embedded in the world. Any output rule that lets you read off precisely the bits from the camera is going to involving adding a bunch of new states to the Turing machine.
So let’s instead consider the space of all ways that you can augment a given physics to produce outputs. This will involve adding a ton of states to the Turing machine. Most of them produce really garbage-y outputs, but a tiny fraction do something intelligent that produces a coherent-looking output.
Some fraction of those involve reading off the bits from a particular camera embedded in the world. Let’s arbitrarily say it’s $2^{- 1000}$ ?
Now consider any given intervention that we can perform to try to manipulate the prior. For example, we can perform a high-energy physics experiment that produces an unprecedented interaction, and control a parameter of how that interaction occurs. We can write the bits in the pattern of giant star-sized clumps of matter. Or we can send the relevant bits out to infinity with unprecedentedly high energies. Or we can build a quadrillion cameras throughout the world. Or we can change the nature of the camera so that more of the possible output rules read off its values. Or we can alter the distribution of matter when the universe comes apart, so that a Turing machine can read it off from that. Or whatever.
It seems to me that “random camera on old Earth” is probably less likely to be output by the physics-extension than some of these other ways of encoding data. For example, maybe $2^{- 600}$ of all output rules end up reading off data from the highest-energy events in the universe, and we can influence that.
I think the only thing that really cuts against this is that a random camera on old earth (i) is earlier in history, (ii) takes place in a world with less interesting stuff going on (so that e.g. rules like “find some interesting structure and then read from it” have an easier time finding it). But those effects appear to be tiny potatoes (and I don’t feel like you are engaging with them yet because you have prior objections).
Overall I’m playing a game of thinking about the distribution of output channels implied by the universal prior, and the probability that distribution places on “Camera on old earth” vs “Best things that a sophisticated civilization can control.” I feel like you are playing some different game.
With maximum entropy beliefs about the output channel, those silly no free lunch theorems of optimization do actually apply.
Most of the possible output rules are not controllable, so you can ignore them. And then amongst those that are controllable, you can control many of them at once.
Maybe one more example to try to communicate why this seems so wrong to me.
Suppose the physics turn out to be a game of life. A dataset on old earth is some totally random configuration of cells (you have to read off the magnetization of a piece of a hard drive, which is traveling through physical at some high rate...). In order to read that off you need to add a ton of stuff to the TM that was simulating the game of life. You aren’t doing something like “Everytime you enter state 23, which is used to update a cell when it has 3 live neighbors, output a 0 and move the output head to the right.” It’s just not even in that vague zone, not even a little bit.
(This is unrealistic both because (i) game of life is an extremely unnatural physics to emerge under a TM-based universal prior, (ii) our physics is completely unlike the game of life. But I think the point is even stronger if you pick a more realistic example and I just want to try to get more and more concrete to understand where the miscommunication is happening.)
But now that we’ve learned that physics is the game of life, we can make much better guesses about how to build a dataset so that a TM could output it. For example, we can:
- Build the dataset at a large number of places. This appears to just linearly increase the probability that it gets read off by many kinds of rules. If the fraction of interesting structures devoted to the dataset is larger than the fraction of interesting structure devoted to the dataset on old earth, this seems to me like it is already enough for us to win the game.
- By performing an incredibly complex engineering project, we can e.g. encode the dataset in the values of a particular cell in the game of life, over a series of contiguous timesteps. This seems easier than reading off a dataset embedded in some random set of cells, which is in turn easier than reading off a dataset not even defined by cells but by higher-level patterns that are meaningful to us only because of the design of the hard drive.
- Suppose we discover that the initial condition involved filling in cells in a tiny region of space and leaving the rest blank. Then I claim there are more TMs that read off values from that region of space than from distant regions of space (the fraction of TMs reading off values from any given part of space needs to fall to 0).
- We can encode the dataset in a huge number of different ways all at once, so that many different output rules will hit it.
You might think that some very different kind of rules dominates the probability of the camera embedded in the game of life, so that none of those arguments are relevant. For example, maybe you think that most of the probability comes from a TM that works by generating the game of life, then looping over small extraction methods until it finds one that has a certain property, and then using that extraction method to produce an output. I’m game with whatever alternative you want to propose; that is, I challenge you to find any plausible description of a rule that outputs the bits observed by a camera, for which I can’t describe a simpler extraction rule that would output some set of bits controlled by the sophisticated civilization.
- michaelcohen 24 May 2021 11:30 UTC
  LW: 1 AF: 1
  AF Parent
  I take your point that we are discussing some output rules which add extra computation states, and so some output rules will add fewer computation states than others.
  I’m merging my response to the rest with my comment here.