TurnTrout comments on TurnTrout’s shortform feed

TurnTrout 11 May 2023 6:57 UTC
LW: 5 AF: 3
AF
Wikipedia has an unfortunate and incorrect-in-generality description of reinforcement learning (emphasis added)
Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward.
Later in the article, talking about basic optimal-control inspired approaches:
The purpose of reinforcement learning is for the agent to learn an optimal, or nearly-optimal, policy that maximizes the “reward function” or other user-provided reinforcement signal that accumulates from the immediate rewards. This is similar to processes that appear to occur in animal psychology. For example, biological brains are hardwired to interpret signals such as pain and hunger as negative reinforcements, and interpret pleasure and food intake as positive reinforcements. In some circumstances, animals can learn to engage in behaviors that optimize these rewards.
Reward is not the optimization target.
It’s not really a surprise that (IMO) the alignment field has anchored on “reward is target” intuitions, given that the broader field of RL has as well. Given bad initialization, conscious effort and linguistic discipline is required in order to correct the initialization.
- Steven Byrnes 11 May 2023 21:25 UTC
  LW: 8 AF: 6
  AF Parent
  The description doesn’t seem so bad to me. Your post “Reward is not the optimization target” is about what actual RL algorithms actually do. The wiki descriptions here are a kind of normative motivation as to how people came to be looking into those algorithms in the first place. Like, if there’s an RL algorithm that performs worse than chance at getting a high reward, then that ain’t an RL algorithm. Right? Nobody would call it that.
  
  I think lots of families of algorithms are likewise lumped together by a kind of normative “goal”, even if any given algorithm in that family is doing something somewhat different and more complicated than “achieving that goal”, and even if, in any given application, the programmer might not want that goal to be perfectly achieved even if it could be. So by the same token, supervised learning algorithms are “supposed” to minimize a loss, compilers are “supposed” to create efficient and correct assembly code, word processors are “supposed” to process words, etc., but in all cases that’s not a literal and complete description of what the algorithms in question actually do, right? It’s a pointer to a class of algorithms.
  
  Sorry if I’m misunderstanding.
  - TurnTrout 15 May 2023 16:50 UTC
    LW: 4 AF: 4
    AF Parent
    I agree that it is narrowly technically accurate as a description of researcher motivation. Note that they don’t offer any other explanation elsewhere in the article.
    Also note that they also make empirical claims:
    The purpose of reinforcement learning is for the agent to learn an optimal, or nearly-optimal, policy that maximizes the “reward function” or other user-provided reinforcement signal that accumulates from the immediate rewards. This is similar to processes that appear to occur in animal psychology...
    In some circumstances, animals can learn to engage in behaviors that optimize these rewards.
    - Steven Byrnes 15 May 2023 17:40 UTC
      LW: 4 AF: 3
      AF Parent
      Sure. That excerpt is not great.
      - TurnTrout 15 May 2023 20:55 UTC
        LW: 4 AF: 3
        AF Parent
        (I do think that animals care about the reinforcement signals and their tight correlates, to some degree, such that it’s reasonable to gloss it as “animals sometimes optimize rewards.” I more strongly object to conflating what the animals may care about with the mechanistic purpose/description of the RL process.)
- niplav 11 May 2023 20:20 UTC
  3 points
  Parent
  I encourage you to fix the mistake. (I can’t guarantee that the fix will be incorporated, but for something this important it’s worth a try).