Later in the article, talking about basic optimal-control inspired approaches:
The purpose of reinforcement learning is for the agent to learn an optimal, or nearly-optimal, policy that maximizes the “reward function” or other user-provided reinforcement signal that accumulates from the immediate rewards. This is similar to processes that appear to occur in animal psychology. For example, biological brains are hardwired to interpret signals such as pain and hunger as negative reinforcements, and interpret pleasure and food intake as positive reinforcements. In some circumstances, animals can learn to engage in behaviors that optimize these rewards.
It’s not really a surprise that (IMO) the alignment field has anchored on “reward is target” intuitions, given that the broader field of RL has as well. Given bad initialization, conscious effort and linguistic discipline is required in order to correct the initialization.
The description doesn’t seem so bad to me. Your post “Reward is not the optimization target” is about what actual RL algorithms actually do. The wiki descriptions here are a kind of normative motivation as to how people came to be looking into those algorithms in the first place. Like, if there’s an RL algorithm that performs worse than chance at getting a high reward, then that ain’t an RL algorithm. Right? Nobody would call it that.
I think lots of families of algorithms are likewise lumped together by a kind of normative “goal”, even if any given algorithm in that family is doing something somewhat different and more complicated than “achieving that goal”, and even if, in any given application, the programmer might not want that goal to be perfectly achieved even if it could be. So by the same token, supervised learning algorithms are “supposed” to minimize a loss, compilers are “supposed” to create efficient and correct assembly code, word processors are “supposed” to process words, etc., but in all cases that’s not a literal and complete description of what the algorithms in question actually do, right? It’s a pointer to a class of algorithms.
I agree that it is narrowly technically accurate as a description of researcher motivation. Note that they don’t offer any other explanation elsewhere in the article.
Also note that they also make empirical claims:
The purpose of reinforcement learning is for the agent to learn an optimal, or nearly-optimal, policy that maximizes the “reward function” or other user-provided reinforcement signal that accumulates from the immediate rewards. This is similar to processes that appear to occur in animal psychology...
In some circumstances, animals can learn to engage in behaviors that optimize these rewards.
(I do think that animals care about the reinforcement signals and their tight correlates, to some degree, such that it’s reasonable to gloss it as “animals sometimes optimize rewards.” I more strongly object to conflating what the animals may care about with the mechanistic purpose/description of the RL process.)
Wikipedia has an unfortunate and incorrect-in-generality description of reinforcement learning (emphasis added)
Later in the article, talking about basic optimal-control inspired approaches:
Reward is not the optimization target.
It’s not really a surprise that (IMO) the alignment field has anchored on “reward is target” intuitions, given that the broader field of RL has as well. Given bad initialization, conscious effort and linguistic discipline is required in order to correct the initialization.
The description doesn’t seem so bad to me. Your post “Reward is not the optimization target” is about what actual RL algorithms actually do. The wiki descriptions here are a kind of normative motivation as to how people came to be looking into those algorithms in the first place. Like, if there’s an RL algorithm that performs worse than chance at getting a high reward, then that ain’t an RL algorithm. Right? Nobody would call it that.
I think lots of families of algorithms are likewise lumped together by a kind of normative “goal”, even if any given algorithm in that family is doing something somewhat different and more complicated than “achieving that goal”, and even if, in any given application, the programmer might not want that goal to be perfectly achieved even if it could be. So by the same token, supervised learning algorithms are “supposed” to minimize a loss, compilers are “supposed” to create efficient and correct assembly code, word processors are “supposed” to process words, etc., but in all cases that’s not a literal and complete description of what the algorithms in question actually do, right? It’s a pointer to a class of algorithms.
Sorry if I’m misunderstanding.
I agree that it is narrowly technically accurate as a description of researcher motivation. Note that they don’t offer any other explanation elsewhere in the article.
Also note that they also make empirical claims:
Sure. That excerpt is not great.
(I do think that animals care about the reinforcement signals and their tight correlates, to some degree, such that it’s reasonable to gloss it as “animals sometimes optimize rewards.” I more strongly object to conflating what the animals may care about with the mechanistic purpose/description of the RL process.)
I encourage you to fix the mistake. (I can’t guarantee that the fix will be incorporated, but for something this important it’s worth a try).