First, it’s uncomputable to measure performance because that involves the Solomonoff prior. You can approximate it if you know some bits of Chaitin’s constant, but that brings a penalty into the description complexity.
Second, I think that saying that comparison is computable means that the utility is only allowed to depend on a finite number of time steps, it rules out even geometric time discount. For such utility functions, the optimal policy has finite description complexity, so g is upper bounded. I doubt that’s useful.
Re first, yep, I missed that :(. M does sound like a more worthy barrier than U. Do you have a working example of a (U,M) where some state machine performs well in a manner that’s hard to detect?
Re second, I realized that this only allows discrete utilities but didn’t think to therefore try a π′ that does an exhaustive search over policies ^^. (I assume you are setting “uncomputable to measure performance because that involves the Solomonoff prior” aside here.) Even so, undecidability of whether 000… and 111… get the same utility sounds like a bug. What other types have you considered for the P representing U?
The box I’m currently thinking in is that a strict upper bound on what we can ask of P is that it decide what statements are true of U. So perhaps we impose some reasonableness constraint on statements, and then can directly ask whether e.g. some observation sequence matching regex1 is preferable to all observation sequences matching regex2?
I don’t think that undecidability of exact comparison (as opposed to comparison within any given margin of error) is necessarily a bug, however, if you really want comparison for periodic sequences, you can insist that the utility function is defined by a finite state machine. This is in any case already a requirement in the bounded compute version.
First, it’s uncomputable to measure performance because that involves the Solomonoff prior. You can approximate it if you know some bits of Chaitin’s constant, but that brings a penalty into the description complexity.
Second, I think that saying that comparison is computable means that the utility is only allowed to depend on a finite number of time steps, it rules out even geometric time discount. For such utility functions, the optimal policy has finite description complexity, so g is upper bounded. I doubt that’s useful.
Re first, yep, I missed that :(. M does sound like a more worthy barrier than U. Do you have a working example of a (U,M) where some state machine performs well in a manner that’s hard to detect?
Re second, I realized that this only allows discrete utilities but didn’t think to therefore try a π′ that does an exhaustive search over policies ^^. (I assume you are setting “uncomputable to measure performance because that involves the Solomonoff prior” aside here.) Even so, undecidability of whether 000… and 111… get the same utility sounds like a bug. What other types have you considered for the P representing U?
The box I’m currently thinking in is that a strict upper bound on what we can ask of P is that it decide what statements are true of U. So perhaps we impose some reasonableness constraint on statements, and then can directly ask whether e.g. some observation sequence matching regex1 is preferable to all observation sequences matching regex2?
Reviewing my “contribution” so far, I’d like to make sure I don’t run out your patience; feel free to ask me to spend way more time thinking before I comment, or attempt https://www.lesswrong.com/posts/sPAA9X6basAXsWhau/announcement-learning-theory-online-course first.
I don’t think that undecidability of exact comparison (as opposed to comparison within any given margin of error) is necessarily a bug, however, if you really want comparison for periodic sequences, you can insist that the utility function is defined by a finite state machine. This is in any case already a requirement in the bounded compute version.