it sounds to me like ruling this out requires an assumption about the correlations of an action being the same as the correlations of an earlier self-modifying action to enforce that later action.
I would guess that assumption would be sufficient to defeat my counter-example, yeah.
I do think this is a big assumption. Definitely not one that I’d want to generally assume for practical purposes, even if it makes for a nicer theory of decision theory. But it would be super interesting if someone could make a proper defense of it typically being true in practice.
E.g.: Is it really true that a human’s decision about whether or not to program a seed AI to take action A has the same correlations as that same superintelligence deciding whether or not to take action A 1000 years later while using a jupiter brain for its computation? Intuitively, I’d say that the human would correlate mostly with other humans and other evolved species, and that the superintelligence would mostly correlate with other superintelligences, and it’d be a big deal if that wasn’t true.
Here’s a different way of framing it: if we don’t make this assumption, is there some useful generalization of UDT which emerges? Or, having not made this assumption, are we stuck in a quagmire where we can’t really say anything useful?
I think about these sorts of ‘technical assumptions’ needed for nice DT results as “sanity checks”:
I think we need to make several significant assumptions like this in order to get nice theoretical DT results.
These nice DT results won’t precisely apply to the real world; however, they do show that the DT being analyzed at least behaves sanely when it is in these ‘easier’ cases.
So it seems like the natural thing to do is prove tiling results, learning results, etc under the necessary technical assumptions, with some concern for how restrictive the assumptions are (broader sanity checks being better), and then also, check whether behavior is “at least somewhat reasonable” in other cases.
So if UDT fails to tile when we remove these assumptions, but, at least appears to choose its successor in a reasonable way given the situation, this would count as a success.
Better, of course, if we can find the more general DT which tiles under weaker assumptions. I do think it’s quite plausible that UDT needs to be generalized; I just expect my generalization of UDT will still need to make an assumption which rules out your counterexample to UDT.
I would guess that assumption would be sufficient to defeat my counter-example, yeah.
I do think this is a big assumption. Definitely not one that I’d want to generally assume for practical purposes, even if it makes for a nicer theory of decision theory. But it would be super interesting if someone could make a proper defense of it typically being true in practice.
E.g.: Is it really true that a human’s decision about whether or not to program a seed AI to take action A has the same correlations as that same superintelligence deciding whether or not to take action A 1000 years later while using a jupiter brain for its computation? Intuitively, I’d say that the human would correlate mostly with other humans and other evolved species, and that the superintelligence would mostly correlate with other superintelligences, and it’d be a big deal if that wasn’t true.
Here’s a different way of framing it: if we don’t make this assumption, is there some useful generalization of UDT which emerges? Or, having not made this assumption, are we stuck in a quagmire where we can’t really say anything useful?
I think about these sorts of ‘technical assumptions’ needed for nice DT results as “sanity checks”:
I think we need to make several significant assumptions like this in order to get nice theoretical DT results.
These nice DT results won’t precisely apply to the real world; however, they do show that the DT being analyzed at least behaves sanely when it is in these ‘easier’ cases.
So it seems like the natural thing to do is prove tiling results, learning results, etc under the necessary technical assumptions, with some concern for how restrictive the assumptions are (broader sanity checks being better), and then also, check whether behavior is “at least somewhat reasonable” in other cases.
So if UDT fails to tile when we remove these assumptions, but, at least appears to choose its successor in a reasonable way given the situation, this would count as a success.
Better, of course, if we can find the more general DT which tiles under weaker assumptions. I do think it’s quite plausible that UDT needs to be generalized; I just expect my generalization of UDT will still need to make an assumption which rules out your counterexample to UDT.