I think that you still haven’t quite grasped what I was saying. Reward is not the optimization target totally applies here. (It was the post itself which only analyzed the model-free case, not that the lesson only applies to the model-free case.)
In the partial quote you provided, I was discussing two specific algorithms which are highly dissimilar to those being discussed here. If (as we were discussing), you’re doing MCTS (or “full-blown backwards induction”) on reward for the leaf nodes, the system optimizes the reward. That is—if most of the optimization power comes from explicit search on an explicit reward criterion (as in AIXI), then you’re optimizing for reward. If you’re doing e.g. AlphaZero, that aggregate system isn’t optimizing for reward.
Despite the derision which accompanies your discussion of Reward is not the optimization target, it seems to me that you still do not understand the points I’m trying to communicate. You should be aware that I don’t think you understand my views or that post’s intended lesson. As I offered before, I’d be open to discussing this more at length if you want clarification.
I think that you still haven’t quite grasped what I was saying. Reward is not the optimization target totally applies here. (It was the post itself which only analyzed the model-free case, not that the lesson only applies to the model-free case.)
In the partial quote you provided, I was discussing two specific algorithms which are highly dissimilar to those being discussed here. If (as we were discussing), you’re doing MCTS (or “full-blown backwards induction”) on reward for the leaf nodes, the system optimizes the reward. That is—if most of the optimization power comes from explicit search on an explicit reward criterion (as in AIXI), then you’re optimizing for reward. If you’re doing e.g. AlphaZero, that aggregate system isn’t optimizing for reward.
Despite the derision which accompanies your discussion of Reward is not the optimization target, it seems to me that you still do not understand the points I’m trying to communicate. You should be aware that I don’t think you understand my views or that post’s intended lesson. As I offered before, I’d be open to discussing this more at length if you want clarification.
CC @faul_sname