Lukas Finnveden comments on In Defense of Open-Minded UDT

Lukas Finnveden 12 Aug 2024 19:18 UTC
LW: 11 AF: 8
0
AF
However, there is no tiling theorem for UDT that I am aware of, which means we don’t know whether UDT is reflectively consistent; it’s only a conjecture.
I think this conjecture is probably false for reasons described in this section of “When does EDT seek evidence about correlations?”. The section offers an argument for why son-of-EDT isn’t UEDT, but I think it generalizes to an argument for why son-of-UEDT isn’t UEDT.
Briefly: UEDT-at-timestep-1 is making a different decision than UEDT-at-timestep-0. This means that its decision might be correlated (according to the prior) with some facts which UEDT-at-timestep-0′s decision isn’t correlated with. From the perspective of UEDT-at-timestep-0, it’s bad to let UEDT-at-timestep-1 make decisions on the basis of correlations with things that UEDT-at-timestep-0 can’t control.
- abramdemski 12 Aug 2024 19:38 UTC
  LW: 2 AF: 2
  0
  AF Parent
  I haven’t analyzed your argument yet, but: tiling arguments will always depend on assumptions. Really, it’s a question of when something tiles, not whether. So, if you’ve got a counterexample to tiling, a natural next question is what assumptions we could make to rule it out, and how unfortunate it is to need those assumptions.
  I might not have understood adequately, yet, but it sounds to me like ruling this out requires an assumption about the correlations of an action being the same as the correlations of an earlier self-modifying action to enforce that later action. This is a big assumption, but at the same time, the sort of assumption I would expect to need in order to justify UDT. As Eliezer put it, tiling results need to assume that the environment only cares about what policy we implement, not our “rituals of cognition” that compute those policies. An earlier act of self-modification vs a later decision is a difference in “ritual of cognition” as opposed to a difference in the policy, to me.
  So, I need to understand the argument better, but it seems to me like this kind of counterexample doesn’t significantly wound the spirit of UDT.
  - Lukas Finnveden 12 Aug 2024 20:29 UTC
    LW: 2 AF: 1
    0
    AF Parent
    
    it sounds to me like ruling this out requires an assumption about the correlations of an action being the same as the correlations of an earlier self-modifying action to enforce that later action.
    
    I would guess that assumption would be sufficient to defeat my counter-example, yeah.
    
    I do think this is a big assumption. Definitely not one that I’d want to generally assume for practical purposes, even if it makes for a nicer theory of decision theory. But it would be super interesting if someone could make a proper defense of it typically being true in practice.
    
    E.g.: Is it really true that a human’s decision about whether or not to program a seed AI to take action A has the same correlations as that same superintelligence deciding whether or not to take action A 1000 years later while using a jupiter brain for its computation? Intuitively, I’d say that the human would correlate mostly with other humans and other evolved species, and that the superintelligence would mostly correlate with other superintelligences, and it’d be a big deal if that wasn’t true.
    - abramdemski 13 Aug 2024 16:19 UTC
      LW: 4 AF: 3
      0
      AF Parent
      Here’s a different way of framing it: if we don’t make this assumption, is there some useful generalization of UDT which emerges? Or, having not made this assumption, are we stuck in a quagmire where we can’t really say anything useful?
      I think about these sorts of ‘technical assumptions’ needed for nice DT results as “sanity checks”:
      I think we need to make several significant assumptions like this in order to get nice theoretical DT results.
      These nice DT results won’t precisely apply to the real world; however, they do show that the DT being analyzed at least behaves sanely when it is in these ‘easier’ cases.
      So it seems like the natural thing to do is prove tiling results, learning results, etc under the necessary technical assumptions, with some concern for how restrictive the assumptions are (broader sanity checks being better), and then also, check whether behavior is “at least somewhat reasonable” in other cases.
      So if UDT fails to tile when we remove these assumptions, but, at least appears to choose its successor in a reasonable way given the situation, this would count as a success.
      Better, of course, if we can find the more general DT which tiles under weaker assumptions. I do think it’s quite plausible that UDT needs to be generalized; I just expect my generalization of UDT will still need to make an assumption which rules out your counterexample to UDT.