You are on spot, though you provided more context than can be traced directly from the cited sentence. When i referred to the naive RL, I had in mind (PO)MDPs with unknown reward function. The reward of unseen state can be predicted only in the sense of Occam Razor-type induction.
You are on spot, though you provided more context than can be traced directly from the cited sentence. When i referred to the naive RL, I had in mind (PO)MDPs with unknown reward function. The reward of unseen state can be predicted only in the sense of Occam Razor-type induction.