I am aware of Reinforcement Learning (I am actually sitting right next to Sutton’s book on the field, which I have fully read), but I think you are right that my point is not very clear.
The way I see it RL goals are really only the goals of the base optimizer. The agents themselves either are not intelligent (follow simple procedural ‘policies’) or are mesa-optimizers that may learn to follow something else entirely (proxies, etc). I updated the text, let me know if it makes more sense now.
I am aware of Reinforcement Learning (I am actually sitting right next to Sutton’s book on the field, which I have fully read), but I think you are right that my point is not very clear.
The way I see it RL goals are really only the goals of the base optimizer. The agents themselves either are not intelligent (follow simple procedural ‘policies’) or are mesa-optimizers that may learn to follow something else entirely (proxies, etc). I updated the text, let me know if it makes more sense now.