lukstafi comments on The value of preserving reality

lukstafi 9 Nov 2010 20:10 UTC
0 points
You are on spot, though you provided more context than can be traced directly from the cited sentence. When i referred to the naive RL, I had in mind (PO)MDPs with unknown reward function. The reward of unseen state can be predicted only in the sense of Occam Razor-type induction.