When describing the failure mode, you have the approximate-Solomonoff agent try to predict s2 (by assigning approximately uniform probability since it can’t invert the hash in O(n^2) time), and then plug that distribution into the reward function to compute the “expected reward” of action 1 (very small, since only one s2 can be correct).
However, this problem would be circumvented if the agent did things the opposite way—first try to predict r (by assigning probability 0.9 to r=1 based on the frequency data), and only then trying to invert the hash and failing.
There might be a more general rule here: the order of prediction where you fail later rather than earlier gives better results. Or: the more approximate an answer, the less precedence it should have in the final prediction.
This is still a bit unsatisfying—it’s not abstract reasoning (at least not obviously) - but I think the equivalent would look more like abstract reasoning if the underlying predictor had to use smarter search on a smaller hypothesis space than Solomonoff induction.
When describing the failure mode, you have the approximate-Solomonoff agent try to predict s2 (by assigning approximately uniform probability since it can’t invert the hash in O(n^2) time), and then plug that distribution into the reward function to compute the “expected reward” of action 1 (very small, since only one s2 can be correct).
However, this problem would be circumvented if the agent did things the opposite way—first try to predict r (by assigning probability 0.9 to r=1 based on the frequency data), and only then trying to invert the hash and failing.
There might be a more general rule here: the order of prediction where you fail later rather than earlier gives better results. Or: the more approximate an answer, the less precedence it should have in the final prediction.
This is still a bit unsatisfying—it’s not abstract reasoning (at least not obviously) - but I think the equivalent would look more like abstract reasoning if the underlying predictor had to use smarter search on a smaller hypothesis space than Solomonoff induction.