I agree that it is narrowly technically accurate as a description of researcher motivation. Note that they don’t offer any other explanation elsewhere in the article.
Also note that they also make empirical claims:
The purpose of reinforcement learning is for the agent to learn an optimal, or nearly-optimal, policy that maximizes the “reward function” or other user-provided reinforcement signal that accumulates from the immediate rewards. This is similar to processes that appear to occur in animal psychology...
In some circumstances, animals can learn to engage in behaviors that optimize these rewards.
(I do think that animals care about the reinforcement signals and their tight correlates, to some degree, such that it’s reasonable to gloss it as “animals sometimes optimize rewards.” I more strongly object to conflating what the animals may care about with the mechanistic purpose/description of the RL process.)
I agree that it is narrowly technically accurate as a description of researcher motivation. Note that they don’t offer any other explanation elsewhere in the article.
Also note that they also make empirical claims:
Sure. That excerpt is not great.
(I do think that animals care about the reinforcement signals and their tight correlates, to some degree, such that it’s reasonable to gloss it as “animals sometimes optimize rewards.” I more strongly object to conflating what the animals may care about with the mechanistic purpose/description of the RL process.)