nostalgebraist comments on Am I confused about the “malign universal prior” argument?

nostalgebraist Aug 29, 2024, 7:35 PM
30 points
10
I hope I’m not misinterpreting your point, and sorry if this comment comes across as frustrated at some points.
I’m not sure you’re misinterpreting me per se, but there are some tacit premises in the background of my argument that you don’t seem to hold. Rather than responding point-by-point, I’ll just say some more stuff about where I’m coming from, and we’ll see if it clarifies things.
You talk a lot about “idealized theories.” These can of course be useful. But not all idealizations are created equal. You have to actually check that your idealization is good enough, in the right ways, for the sorts of things you’re asking it to do.
In physics and applied mathematics, one often finds oneself considering a system that looks like
- some “base” system that’s well-understood and easy to analyze, plus
- some additional nuance or dynamic that makes things much harder to analyze in general—but which we can safely assume has much smaller effects that the rest of the system
We quantify the size of the additional nuance with a small parameter $ϵ$ . If $ϵ$ is literally 0, that’s just the base system, but we want to go a step further: we want to understand what happens when the nuance is present, just very small. So, we do something like formulating the solution as a power series in $ϵ$ , and truncating to first order. (This is perturbation theory, or more generally asymptotic analysis.)
This sort of approximation gets better and better as $ϵ$ gets closer to 0, because this magnifies the difference in size between the truncated terms (of size $O (ϵ^{2})$ and smaller) and the retained $O (ϵ)$ term. In some sense, we are studying the $ϵ \to 0$ limit.
But we’re specifically interested in the behavior of the system given a nonzero, but arbitrarily small, value of $ϵ$ . We want an approximation that works well if $ϵ = 10^{- 6}$ , and even better if $ϵ = 10^{- 12}$ , and so on. We don’t especially care about the literal $ϵ = 0$ case, except insofar as it sheds light on the very-small-but-nonzero behavior.
Now, sometimes the $ϵ \to 0$ limit of the “very-small-but-nonzero behavior” simply is the $ϵ = 0$ behavior, the base system. That is, what you get at very small $ϵ$ looks just like the base system, plus some little $O (ϵ)$ -sized wrinkle.
But sometimes – in so-called “singular perturbation” problems – it doesn’t. Here the system has qualitatively different behavior from the base system given any nonzero $ϵ$ , no matter how small.
Typically what happens is that $ϵ$ ends up determining, not the magnitude of the deviation from the base system, but the “scale” of that deviation in space and/or time. So that in the limit, you get behavior with an $O (1)$ -sized difference from the base system’s behavior that’s constrained to a tiny region of space and/or oscillating very quickly.
Boundary layers in fluids are a classic example. Boundary layers are tiny pockets of distinct behavior, occurring only in small $ϵ$ -sized regions and not in most of the fluid. But they make a big difference just by being present at all. Knowing that there’s a boundary layer around a human body, or an airplane wing, is crucial for predicting the thermal and mechanical interactions of those objects with the air around them, even though it takes up a tiny fraction of the available space, the rest of which is filled by the object and by non-boundary-layer air. (Meanwhile, the planetary boundary layer is tiny relative to the earth’s full atmosphere, but, uh, we live in it.)
In the former case (“regular perturbation problems”), “idealized” reasoning about the $ϵ = 0$ case provides a reliable guide to the small-but-nonzero behavior. We want to go further, and understand the small-but-nonzero effects too, but we know they won’t make a qualitative difference.
In the singular case, though, the $ϵ = 0$ “idealization” is qualitatively, catastrophically wrong. If you make an idealization that assumes away the possibility of boundary layers, then you’re going to be wrong about what happens in a fluid – even about the big, qualitative, $O (1)$ stuff.
You need to know which kind of case you’re in. You need to know whether you’re assuming away irrelevant wrinkles, or whether you’re assuming away the mechanisms that determine the high-level, qualitative, $O (1)$ stuff.
Back to the situation at hand.
In reality, TMs can only do computable stuff. But for simplicity, as an “idealization,” we are considering a model where we pretend they have a UP oracle, and can exactly compute the UP.
We are justifying this by saying that the TMs will try to approximate the UP, and that this approximation will be very good. So, the approximation error is an $O (ϵ)$ -sized “additional nuance” in the problem.
Is this more like a regular perturbation problem, or more like a singular one? Singular, I think.
The $ϵ = 0$ case, where the TMs can exactly compute the UP, is a problem involving self-reference. We have a UP containing TMs, which in turn contain the very same UP.
Self-referential systems have a certain flavor, a certain “rigidity.” (I realize this is vague, sorry, I hope it’s clear enough what I mean.) If we have some possible behavior of the system $X$ , most ways of modifying $X$ (even slightly) will not produce behaviors which are themselves possible. The effect of the modification as it “goes out” along the self-referential path has to precisely match the “incoming” difference that would be needed to cause exactly this modification in the first place.
“Stable time loop”-style time travel in science fiction is an example of this; it’s difficult to write, in part because of this “rigidity.” (As I know from experience :)
On the other hand, the situation with a small-but-nonzero $ϵ$ is quite different.
With literal self-reference, one might say that “the loop only happens once”: we have to precisely match up the outgoing effects (“UP inside a TM”) with the incoming causes (“UP^[1] with TMs inside”), but then we’re done. There’s no need to dive inside the UP that happens within a TM and study it, because we’re already studying it, it’s the same UP we already have at the outermost layer.
But if the UP inside a given TM is merely an approximation, then what happens inside it is not the same as the UP we have at the outermost layer. It does not contain not the same TMs we already have.
It contains some approximate thing, which (and this is the key point) might need to contain an even more coarsely approximated UP inside of its approximated TMs. (Our original argument for why approximation is needed might hold, again and equally well, at this level.) And the next level inside might be even more coarsely approximated, and so on.
To determine the behavior of the outermost layer, we now need to understand the behavior of this whole series, because each layer determines what the next one up will observe.
Does the series tend toward some asymptote? Does it reach a fixed point and then stay there? What do these asymptotes, or fixed points, actually look like? Can we avoid ever reaching a level of approximation that’s no longer $O (ϵ)$ but $O (1)$ , even as we descend through an $O (1 / ϵ)$ number of series iterations?
I have no idea! I have not thought about it much. My point is simply that you have to consider the fact that approximation is involved in order to even ask the right questions, about asymptotes and fixed point and such. Once we acknowledge that approximation is involved, we get this series structure and care about its limiting behavior; this qualitative structure is not present at all in the idealized case where we imagine the TMs have UP oracles.
I also want to say something about the size of the approximations involved.
Above, I casually described the approximation errors as $O (ϵ)$ , and imagined an $ϵ \to 0$ limit.
But in fact, we should not imagine that these errors can come as close to zero as we like. The UP is uncomptuable, and involves running every TM at once^[2]. Why would we imagine that a single TM can approximate this arbitrarily well?^[3]
Like the gap between the finite and the infinite, or between polynomial and exponential runtime, gap between the uncomptuable and the comptuable is not to be trifled with.
Finally: the thing we get when we equip all the TMs with UP oracles isn’t the UP, it’s something else. (As far as I know, anyway.) That is, the self-referential quality of this system is itself only approximate (and it is by no means clear that the approximation error is small – why would it be?). If we have the UP at the bottom, inside the TMs, then we don’t have it at the outermost layer. Ignoring this distinction is, I guess, part of the “idealization,” but it is not clear to me why we should feel safe doing so.
1. ^
  The thing outside the TMs here can’t really be the UP, but I’ll ignore this now and bring it up again at the end.
2. ^
  In particular, running them all at once and actually using the outputs, at some (“finite”) time at which one needs the outputs for making a decision. It’s possible to run every TM inside of a single TM, but only by incurring slowdowns that grow without bound across the series of TMs; this approach won’t get you all the information you need, at once, at any finite time.
3. ^
  There may be some result along these lines that I’m unaware of. I know there are results showing that the UP and SI perform well relative to the best computable prior/predictor, but that’s not the same thing. Any given computable prior/predictor won’t “know” whether or not it’s the best out of the multitude, or how to correct itself if it isn’t; that’s the value added by UP / SI.
- Jeremy Gillen Aug 30, 2024, 1:07 PM
  8 points
  0
  Parent
  Great explanation, you have found the crux. I didn’t know such problems were called singular perturbation problems.
  If I thought that reasoning about the UP was definitely a singular perturbation problem in the relevant sense, then I would agree with you (that the malign prior argument doesn’t really work). I think it’s probably not, but I’m not extremely confident.
  Your argument that it is a singular perturbation problem is that it involves self reference. I agree that self-reference is kinda special and can make it difficult to formally model things, but I will argue that it is often reasonable to just treat the inner approximations as exact.
  The reason is: Problems that involve self reference are often easy to approximate by using more coarse-grained models as you move deeper.
  One example as an intuition pump is an MCTS chess bot. In order to find a good move, it needs to think about its opponent thinking about itself, etc. We can’t compute this (because its exponential, not because its non-computable), but if we approximate the deeper layers by pretending they move randomly (!), it works quite well. Having a better move distribution works even better.
  Maybe you’ll object that this example isn’t precisely self-reference. But the same algorithm (usually) works for finding a nash equilibria on simultaneous move games, which do involve infinitely deep self reference.
  And another more general way of doing essentially the same thing is using a reflective oracle. Which I believe can also be used to describe a UP that can contain infinitely deep self-reference (see the last paragraph of the conclusion).^[1] I think the fact that Paul worked on this suggests that he did see the potential issues with self-reference and wanted better ways to reason formally about such systems.
  To be clear, I don’t think any of these examples tells us that the problem is definitely a regular perturbation problem. But I think these examples do suggest that assuming that it is regular is a very reasonable place to start, and probably tells us a lot about similar, more realistic, systems.
  On the gap between the computable and uncomputable: It’s not so bad to trifle a little. Diagonalization arguments can often be avoided with small changes to the setup, and a few of Paul’s papers are about doing exactly this.
  And the same argument works for a computable prior. E.g. we could make a prior over a finite set of total turing machines, such that it still contained universes with clever agents.
  Why would we imagine that a single TM can approximate this arbitrarily well?
  If I remember correctly, a single TM definitely can’t approximate it arbitrarily well. But my argument doesn’t depend on this.
  1. ^
    Don’t trust me on this though, my understanding of reflective oracles is very limited.
  - LGS Aug 30, 2024, 8:55 PM
    5 points
    0
    Parent
    Thanks for the link to reflective oracles!
    On the gap between the computable and uncomputable: It’s not so bad to trifle a little. Diagonalization arguments can often be avoided with small changes to the setup, and a few of Paul’s papers are about doing exactly this.
    
    I strongly disagree with this: diagonalization arguments often cannot be avoided at all, not matter how you change the setup. This is what vexed logicians in the early 20th century: no matter how you change your formal system, you won’t be able to avoid Godel’s incompleteness theorems.
    There is a trick that reliably gets you out of such paradoxes, however: switch to probabilistic mixtures. This is easily seen in a game setting: in rock-paper-scissors, there is no deterministic Nash equilibrium. Switch to mixed strategies, however, and suddenly there is always a Nash equilibrium.
    This is the trick that Paul is using: he is switching from deterministic Turing machines to randomized ones. That’s fine as far as it goes, but it has some weird side effects. One of them is that if a civilization is trying to predict the universal prior that is simulating itself, and tries to send a message, then it is likely that with “reflexive oracles” in place, the only message it can send is random noise. That is, Paul shows reflexive oracles exist in the same way that Nash equilibria exist; but there is no control over what the reflexive oracle actually is, and in paradoxical situations (like rock-paper-scissors) the Nash equilibrium is the boring “mix everything together uniformly”.
    The underlying issue is that a universe that can predict the universal prior, which in turn simulates the universe itself, can encounter a grandfather paradox. It can see its own future by looking at the simulation, and then it can do the opposite. The grandfather paradox is where the universe decides to kill the grandfather of a child that the simulation predicts.
    Paul solves this by only letting it see its own future using a “reflexive oracle” which essentially finds a fixed point (which is a probability distribution). The fixed point of a grandfather paradox is something like “half the time the simulation shows the grandchild alive, causing the real universe to kill the grandfather; the other half the time, the simulation shows the grandfather dead and the grandchild not existing”. Such a fixed point exists even when the universe tries to do the opposite of the prediction.
    The thing is, this fixed point is boring! Repeat this enough times, and it eventually just says “well my prediction about your future is random noise that doesn’t have to actually come true in your own future”. I suspect that if you tried to send a message through the universal prior in this setting, the message would consist of essentially uniformly random bits. This would depend on the details of the setup, I guess.
    - Jeremy Gillen Aug 30, 2024, 9:08 PM
      2 points
      0
      Parent
      I strongly disagree with this: diagonalization arguments often cannot be avoided at all, not matter how you change the setup. …
      There is a trick that reliably gets you out of such paradoxes, however: switch to probabilistic mixtures.
      Fair enough, the probabilistic mixtures thing was what I was thinking of as a change of setup, but reasonable to not consider it such.
      the message would consist of essentially uniformly random bits
      I don’t see how this is implied. If a fact is consistent across levels, and determined in a non-paradoxical way, can’t this become a natural fixed point that can be “transmitted” across levels? And isn’t this kind of knowledge all that is required for the malign prior argument to work?
      - LGS Aug 30, 2024, 9:21 PM
        3 points
        0
        Parent
        The problem is that the act of leaving the message depends on the output of the oracle (otherwise you wouldn’t need the oracle at all, but you also would not know how to leave a message). If the behavior of the machine depends on the oracle’s actions, then we have to be careful with what the fixed point will be.
        
        For example, if we try to fight the oracle and do the opposite, we get the “noise” situation from the grandfather paradox.
        
        But if we try to cooperate with the oracle and do what it predicts, then there are many different fixed points and no telling which the oracle would choose (this is not specified in the setting).
        
        It would be great to see a formal model of the situation. I think any model in which such message transmission would work is likely to require some heroic assumptions which don’t correspond much to real life.
        Jeremy Gillen Aug 30, 2024, 11:00 PM
        2 points
        0
        Parent
        If the only transmissible message is essentially uniformly random bits, then of what value is the oracle?
        I claim the message can contain lots of information. E.g. if there are 2^100 potential actions, but only 2 fixed points, then 99 bits have been transmitted (relative to uniform).
        The rock-paper-scissors example is relatively special, in that the oracle can’t narrow down the space of actions at all.
        The UP situation looks to me to be more like the first situation than the second.
        LGS Aug 31, 2024, 1:48 AM
        3 points
        0
        Parent
        It would help to have a more formal model, but as far as I can tell the oracle can only narrow down its predictions of the future to the extent that those predictions are independent of the oracle’s output. That is to say, if the people in the universe ignore what the oracle says, then the oracle can give an informative prediction.
        This would seem to exactly rule out any type of signal which depends on the oracle’s output, which is precisely the types of signals that nostalgebraist was concerned about.
        Jeremy Gillen Aug 31, 2024, 6:08 PM
        4 points
        2
        Parent
        That can’t be right in general. Normal nash equilibria can narrow down predictions of actions. E.g. competition game. This is despite each player’s decision being dependent on the other player’s action.
        LGS Aug 31, 2024, 7:17 PM
        3 points
        0
        Parent
        That’s fair, yeah
        We need a proper mathematical model to study this further. I expect it to be difficult to set up because the situation is so unrealistic/impossible as to be hard to model. But if you do have a model in mind I’ll take a look
    - Noosphere89 Jan 26, 2025, 3:38 PM
      2 points
      0
      Parent
      One caveat to this quote below is that Godel’s first incompleteness theorem relies on the assumption of the formal system being recursively enumerable, and if we drop this requirement, then we can get a consistent and complete description of say, first order arithmetic.
      
      More here:
      
      https://en.wikipedia.org/wiki/Gödel’s_incompleteness_theorems#Effective_axiomatization
      
      I strongly disagree with this: diagonalization arguments often cannot be avoided at all, not matter how you change the setup. This is what vexed logicians in the early 20th century: no matter how you change your formal system, you won’t be able to avoid Godel’s incompleteness theorems.