nostalgebraist comments on Am I confused about the “malign universal prior” argument?

nostalgebraist 30 Aug 2024 23:22 UTC
9 points
2
Thanks.
I admit I’m not closely familiar with Tegmark’s views, but I know he has considered two distinct things that might be called “the Level IV multiverse”:
- a “mathematical universe” in which all mathematical constructs exist
- a more restrictive “computable universe” in which only computable things exist
(I’m getting this from his paper here.)
In particular, Tegmark speculates that the computable universe is “distributed” following the UP (as you say in your final bullet point). This would mean e.g. that one shouldn’t be too surprised to find oneself living in a TM of any given K-complexity, despite the fact that “almost all” TMs have higher complexity (in the same sense that “almost all” natural numbers are greater than any given number $n$ ).
When you say “Tegmark IV,” I assume you mean the computable version—right? That’s the thing which Tegmark says might be distributed like the UP. If we’re in some uncomputable world, the UP won’t help us “locate” ourselves, but if the world has to be computable then we’re good^[1].
With that out of the way, here is why this argument feels off to me.
First, Tegmark IV is an ontological idea, about what exists at the “outermost layer of reality.” There’s no one outside of Tegmark IV who’s using it to predict something else; indeed, there’s no one outside of it at all; it is everything that exists, full stop.
“Okay,” you might say, “but wait—we are all somewhere inside Tegmark IV, and trying to figure out just which part of it we’re in. That is, we are all attempting to answer the question, ‘what happens when you update the UP on my own observations?’ So we are all effectively trying to ‘make decisions on the basis of the UP,’ and vulnerable to its weirdness, insofar as it is weird.”
Sure. But in this picture, “we” (the UP-using dupes) and “the consequentialists” are on an even footing: we are both living in some TM or other, and trying to figure out which one.
In which case we have to ask: why would such entities ever come to such a destructive, bad-for-everyone (acausal) agreement?
Presumably the consequentalists don’t want to be duped; they would prefer to be able to locate themselves in Tegmark IV, and make decisions accordingly, without facing such irritating complications.
But, by writing to “output channels”^[2] in the malign manner, the consequentalists are simply causing the multiverse to be the sort of place where those irritating complications happen to beings like them (beings in TMs trying to figure out which TM they’re in) -- and what’s more, they’re expending time and scarce resources to “purchase” this undesirable state of affairs!
In order for malignity to be worth it, we need something to break the symmetry between “dupes” (UP users) and “con men” (consequentialists), separating the world into two classes, so that the would-be con men can plausibly reason, “I may act in a malign way without the consequences raining down directly on my head.”
We have this sort of symmetry-breaker in the version of the argument that postulates, by fiat, a “UP-using dupe” somewhere, for some reason, and then proceeds to reason about the properties of the (potentially very different, not “UP-using”?) guys inside the TMs. A sort of struggle between conniving, computable mortals and overly-innocent, uncomputable angels. Here we might argue that things really will go wrong for the angels, that they will be the “dupes” of the mortals, who are not like them and who do not themselves get duped. (But I think this form of the argument has other problems, the ones I described in the OP.)
But if the reason we care about the UP is simply that we’re all in TMs, trying to find our location within Tegmark IV, then we’re all in this together. We can just notice that we’d all be better off if no one did the malign thing, and then no one will do it^[3].
In other words, in your picture (and Paul’s), we are asked to imagine that the computable world abounds with malign, wised-up consequentialist con men, who’ve “read Paul’s post” (i.e. re-derived similar arguments) and who appreciate the implications. But if so, then where are the marks? If we’re not postulating some mysterious UP-using angel outside of the computable universe, then who is there to deceive? And if there’s no one to deceive, why go to the trouble?
1. ^
  I don’t think this distinction actually matters for what’s below, I just mention it to make sure I’m following you.
2. ^
  I’m picturing a sort of acausal I’m-thinking-about-you-thinking-about me situation in which, although I might never actually read what’s written on those channels (after all I am not “outside” Tegmark IV looking in), nonetheless I can reason about what someone might write there, and thus it matters what is actually written there. I’ll only conclude “yeah that’s what I’d actually see if I looked” if the consequentialists convince me they’d really pull the trigger, even if they’re only pulling the trigger for the sake of convincing me, and we both know I’ll never really look.
3. ^
  Note that, in the version of this picture that involves abstract generalized reasoning rather than simulation of specific worlds, defection is fruitless: if you are trying to manipulate someone who is just thinking about whether beings will do X as a general rule, you don’t get anything out of raising your hand and saying “well, in reality, I will!” No one will notice; they aren’t actually looking at you, ever, just at the general trend. And of course “they” know all this, which raises “their” confidence that no one will raise their hand; and “you” know that “they” know, which makes “you” less interested in raising that same hand; and so forth.
- Thane Ruthenis 31 Aug 2024 4:07 UTC
  7 points
  1
  Parent
  When you say “Tegmark IV,” I assume you mean the computable version—right?
  Yep.
  We have this sort of symmetry-breaker in the version of the argument that postulates, by fiat, a “UP-using dupe” somewhere, for some reason
  Correction: on my model, the dupe is also using an approximation of the UP, not the UP itself. I. e., it doesn’t need to be uncomputable. The difference between it and the con men is just the naivety of the design. It generates guesses regarding what universes it’s most likely to be in (potentially using abstract reasoning), but then doesn’t “filter” these universes; doesn’t actually “look inside” and determine if it’s a good idea to use a specific universe as a model. It doesn’t consider the possibility of being manipulated through it; doesn’t consider the possibility that it contains daemons.
  I. e.: the real difference is that the “dupe” is using causal decision theory, not functional decision theory.
  We can just notice that we’d all be better off if no one did the malign thing, and then no one will do it
  I think that’s plausible: that there aren’t actually that many “UP-using dupes” in existence, so the con men don’t actually care to stage these acausal attacks.
  But: if that is the case, it’s because the entities designing/becoming powerful agents considered the possibility of con men manipulating the UP, and so made sure that they’re not just naively using the unfiltered (approximation of the) UP.
  That is: yes, it seems likely that the equilibrium state of affairs here is “nobody is actually messing with the UP”. But it’s because everyone knows the UP could be messed with in this manner, so no-one is using it (nor its computationally tractable approximations).
  It might also not be the case, however. Maybe there are large swathes of reality populated by powerful yet naive agents, such that whatever process constructs them (some alien evolution analogue?), it doesn’t teach them good decision theory at all. So when they figure out Tegmark IV and the possibility of acausal attacks/being simulation-captured, they give in to whatever “demands” are posed them. (I. e., there might be entire “worlds of dupes”, somewhere out there among the mathematically possible.)
  That said, the “dupe” label actually does apply to a lot of humans, I think. I expect that a lot of people, if they ended up believing that they’re in a simulation and that the simulators would do bad things to them unless they do X, would do X. The acausal con men would only care to actually do it, however, if a given person is (1) in the position where they could do something with large-scale consequences, (2) smart enough to consider the possibility of simulation-capture, (3) not smart enough to ignore blackmail.
  - nostalgebraist 31 Aug 2024 18:15 UTC
    4 points
    0
    Parent
    Cool, it sounds we basically agree!
    But: if that is the case, it’s because the entities designing/becoming powerful agents considered the possibility of con men manipulating the UP, and so made sure that they’re not just naively using the unfiltered (approximation of the) UP.
    I’m not sure of this. It seems at least possible that we could get an equilibrium where everyone does use the unfiltered UP (in some part of their reasoning process), trusting that no one will manipulate them because (a) manipulative behavior is costly and (b) no one has any reason to expect anyone else will reason differently from them, so if you choose to manipulate someone else you’re effectively choosing that someone else will manipulate you.
    Perhaps I’m misunderstanding you. I’m imagining something like choosing one’s one decision procedure in TDT, where one ends up choosing a procedure that involves “the unfiltered UP” somewhere, and which doesn’t do manipulation. (If your procedure involved manipulation, so would your copy’s procedure, and you would get manipulated; you don’t want this, so you don’t manipulate, nor does your copy.) But you write
    the real difference is that the “dupe” is using causal decision theory, not functional decision theory
    whereas it seems to me that TDT/FDT-style reasoning is precisely what allows us to “naively” trust the UP, here, without having to do the hard work of “filtering.” That is: this kind of reasoning tells us to behave so that the UP won’t be malign; hence, the UP isn’t malign; hence, we can “naively” trust it, as though it weren’t malign (because it isn’t).
    More broadly, though—we are now talking about something that I feel like I basically understand and basically agree with, and just arguing over the details, which is very much not the case with standard presentations of the malignity argument. So, thanks for that.
    - Thane Ruthenis 31 Aug 2024 19:17 UTC
      4 points
      0
      Parent
      I’m not sure of this. It seems at least possible that we could get an equilibrium where everyone does use the unfiltered UP (in some part of their reasoning process), trusting that no one will manipulate them because (a) manipulative behavior is costly and (b) no one has any reason to expect anyone else will reason differently from them, so if you choose to manipulate someone else you’re effectively choosing that someone else will manipulate you.
      Fair point! I agree.