But, the gist of your post seems to be: “Since coming up with UDT, we ran into these problems, made no progress, and are apparently at a dead end. Therefore, UDT might have been the wrong turn entirely.”
This is a bit stronger than how I would phrase it, but basically yes.
On the other hand, my view is: Since coming up with those problems, we made a lot of progress on agent theory within the LTA
I tend to be pretty skeptical of new ideas. (This backfired spectacularly once, when I didn’t pay much attention to Satoshi when he contacted me about Bitcoin, but I think in general has served me well.) My experience with philosophical questions is that even when some approach looks a stone’s throw away from a final solution to some problem, a bunch of new problems pop up and show that we’re still quite far away. With an approach that is still as early as yours, I just think there’s quite a good chance it doesn’t work out in the end, or gets stuck somewhere on a hard problem. (Also some people who have digged into the details don’t seem as optimistic that it is the right approach.) So I’m reluctant to decrease my probability of “UDT was a wrong turn” too much based on it.
The rest of your discussion about 2TDT-1CDT seems plausible to me, although of course depends on whether the math works out, doing something about monotonicity, and also a solution to the problem of how to choose one’s IBH prior. (If the solution was something like “it’s subjective/arbitrary” that would be pretty unsatisfying from my perspective.)
...the problem of how to choose one’s IBH prior. (If the solution was something like “it’s subjective/arbitrary” that would be pretty unsatisfying from my perspective.)
It seems clear to me that the prior is subjective. Like with Solomonoff induction, I expect there to exist something like the right asymptotic for the prior (i.e. an equivalence class of priors under the equivalence relation where μ and ν are equivalent when there exists some C>0 s.t.μ≤Cν and ν≤Cμ), but not a unique correct prior, just like there is no unique correct UTM. In fact, my arguments about IBH already rely on the asymptotic of the prior to some extent.
One way to view the non-uniqueness of the prior is through an evolutionary perspective: agents with prior X are likely to evolve/flourish in universes sampled from prior X, while agents with prior Y are likely to evolve/flourish in universes sampled from prior Y. No prior is superior across all universes: there’s no free lunch.
For the purpose of AI alignment, the solution is some combination of (i) learn the user’s prior and (ii) choose some intuitively appealing measure of description complexity, e.g. length of lambda-term (i is insufficient in itself because you need some ur-prior to learn the user’s prior). The claim is, different reasonable choices in ii will lead to similar results.
Given all that, I’m not sure what’s still unsatisfying. Is there any reason to believe something is missing in this picture?
This is a bit stronger than how I would phrase it, but basically yes.
I tend to be pretty skeptical of new ideas. (This backfired spectacularly once, when I didn’t pay much attention to Satoshi when he contacted me about Bitcoin, but I think in general has served me well.) My experience with philosophical questions is that even when some approach looks a stone’s throw away from a final solution to some problem, a bunch of new problems pop up and show that we’re still quite far away. With an approach that is still as early as yours, I just think there’s quite a good chance it doesn’t work out in the end, or gets stuck somewhere on a hard problem. (Also some people who have digged into the details don’t seem as optimistic that it is the right approach.) So I’m reluctant to decrease my probability of “UDT was a wrong turn” too much based on it.
The rest of your discussion about 2TDT-1CDT seems plausible to me, although of course depends on whether the math works out, doing something about monotonicity, and also a solution to the problem of how to choose one’s IBH prior. (If the solution was something like “it’s subjective/arbitrary” that would be pretty unsatisfying from my perspective.)
It seems clear to me that the prior is subjective. Like with Solomonoff induction, I expect there to exist something like the right asymptotic for the prior (i.e. an equivalence class of priors under the equivalence relation where μ and ν are equivalent when there exists some C>0 s.t.μ≤Cν and ν≤Cμ), but not a unique correct prior, just like there is no unique correct UTM. In fact, my arguments about IBH already rely on the asymptotic of the prior to some extent.
One way to view the non-uniqueness of the prior is through an evolutionary perspective: agents with prior X are likely to evolve/flourish in universes sampled from prior X, while agents with prior Y are likely to evolve/flourish in universes sampled from prior Y. No prior is superior across all universes: there’s no free lunch.
For the purpose of AI alignment, the solution is some combination of (i) learn the user’s prior and (ii) choose some intuitively appealing measure of description complexity, e.g. length of lambda-term (i is insufficient in itself because you need some ur-prior to learn the user’s prior). The claim is, different reasonable choices in ii will lead to similar results.
Given all that, I’m not sure what’s still unsatisfying. Is there any reason to believe something is missing in this picture?