Re first, yep, I missed that :(. M does sound like a more worthy barrier than U. Do you have a working example of a (U,M) where some state machine performs well in a manner that’s hard to detect?
Re second, I realized that this only allows discrete utilities but didn’t think to therefore try a π′ that does an exhaustive search over policies ^^. (I assume you are setting “uncomputable to measure performance because that involves the Solomonoff prior” aside here.) Even so, undecidability of whether 000… and 111… get the same utility sounds like a bug. What other types have you considered for the P representing U?
The box I’m currently thinking in is that a strict upper bound on what we can ask of P is that it decide what statements are true of U. So perhaps we impose some reasonableness constraint on statements, and then can directly ask whether e.g. some observation sequence matching regex1 is preferable to all observation sequences matching regex2?
I don’t think that undecidability of exact comparison (as opposed to comparison within any given margin of error) is necessarily a bug, however, if you really want comparison for periodic sequences, you can insist that the utility function is defined by a finite state machine. This is in any case already a requirement in the bounded compute version.
Re first, yep, I missed that :(. M does sound like a more worthy barrier than U. Do you have a working example of a (U,M) where some state machine performs well in a manner that’s hard to detect?
Re second, I realized that this only allows discrete utilities but didn’t think to therefore try a π′ that does an exhaustive search over policies ^^. (I assume you are setting “uncomputable to measure performance because that involves the Solomonoff prior” aside here.) Even so, undecidability of whether 000… and 111… get the same utility sounds like a bug. What other types have you considered for the P representing U?
The box I’m currently thinking in is that a strict upper bound on what we can ask of P is that it decide what statements are true of U. So perhaps we impose some reasonableness constraint on statements, and then can directly ask whether e.g. some observation sequence matching regex1 is preferable to all observation sequences matching regex2?
Reviewing my “contribution” so far, I’d like to make sure I don’t run out your patience; feel free to ask me to spend way more time thinking before I comment, or attempt https://www.lesswrong.com/posts/sPAA9X6basAXsWhau/announcement-learning-theory-online-course first.
I don’t think that undecidability of exact comparison (as opposed to comparison within any given margin of error) is necessarily a bug, however, if you really want comparison for periodic sequences, you can insist that the utility function is defined by a finite state machine. This is in any case already a requirement in the bounded compute version.