keith_wynroe comments on A Simple Toy Coherence Theorem

keith_wynroe 10 Nov 2024 15:42 UTC
1 point
0
I feel like this could branch out into a lot of small disagreements here but in the interest of keeping it streamlined:
One of the consequences of this, however, is that this prefix-based encoding method is only optimal for functions whose prefix-free encodings (i.e. encodings that cannot be partitioned into substrings such that one of the substrings encodes another UTM) in UTM1 and UTM2 differ in length by more than len(P). And, since len(P) is a measure of UTM2′s complexity relative to UTM1, it follows directly that a UTM2 whose “coding scheme” is such that a function whose prefix-free encoding in UTM2 differs in length from its prefix-free encoding in UTM1 by some large constant (say, ~2^10^80), P itself must be on the order of 2^10^80—in other words, UTM2 must have an astronomical complexity relative to UTM1.
I agree with all of this, and wasn’t gesturing at anything related to it, so I think we’re talking past eachother. My point was simply that two UTMs even with not very-large prefix encodings can wind up with extremely different priors, but I don’t think that’s too relevant to what your main point is
For any physically realizable universal computational system, that system can be analogized to UTM1 in the above analysis. If you have some behavioral policy that is e.g. deontological in nature, that behavioral policy can in principle be recast as an optimization criterion over universe histories; however, this criterion will in all likelihood have a prefix-free description in UTM1 of length ~2^10^80. And, crucially, there will be no UTM2 in whose encoding scheme the criterion in question has a prefix-free description of much less than ~2^10^80, without that UTM2 itself having a description complexity of ~2^10^80 relative to UTM1—meaning, there is no physically realizable system that can implement UTM2.
I think I disagree with almost all of this. You can fix some gerrymandered extant physical system right now that ends up looking like a garbled world-history optimizer, I doubt that it would take on the order of length ~2^10^80 to specify it. But granting that these systems would in fact have astronomical prefixes, I think this is a ponens/tollens situation: if these systems actually have a huge prefix, that tells me that some the encoding schemes of some physically realisable systems are deeply incompatible with mine, not that those systems which are out there right now aren’t physically realisible.
I imagine an objection is that these physical systems are not actually world-history optimizers and are actually going to be much more compressible than I’m making them out to be, so your argument goes through. In which case I’m fine with this, this just seems like a differing definition of what counts as when two schemes are acting “virtually identically” w.r.t to optimization criteria. If your argument is valid but is bounding this similarity to include e.g random chunks of a rock floating through space, then I’m happy to concede that—seems quite trivial and not at all worrying from the original perspective of bounding the kinds of optimization criteria an AI might have