Charlie Steiner comments on A problem with resource-bounded Solomonoff induction and unpredictable environments

Charlie Steiner 28 Jul 2015 1:38 UTC
LW: 3 AF: 2
AF
When describing the failure mode, you have the approximate-Solomonoff agent try to predict $s_{2}$ (by assigning approximately uniform probability since it can’t invert the hash in O(n^2) time), and then plug that distribution into the reward function to compute the “expected reward” of action 1 (very small, since only one $s_{2}$ can be correct).

However, this problem would be circumvented if the agent did things the opposite way—first try to predict $r$ (by assigning probability 0.9 to $r = 1$ based on the frequency data), and only then trying to invert the hash and failing.

There might be a more general rule here: the order of prediction where you fail later rather than earlier gives better results. Or: the more approximate an answer, the less precedence it should have in the final prediction.

This is still a bit unsatisfying—it’s not abstract reasoning (at least not obviously) - but I think the equivalent would look more like abstract reasoning if the underlying predictor had to use smarter search on a smaller hypothesis space than Solomonoff induction.