Steven Byrnes comments on TurnTrout’s shortform feed

Steven Byrnes 3 Feb 2024 16:49 UTC
8 points
6
My understanding of Alex’s point is that the word “reward” invokes a mental image involving model-based planning—”ooh, there’s a reward, what can I do right now to get it?”. And the word “reinforcement” invokes a mental image involving change (i.e. weight updates)—when you reinforce a bridge, you’re permanently changing something about the structure of the bridge, such that the bridge will be (hopefully) better in the future than it was in the past.
So if you want to reason about policy-gradient-based RL algorithms (for example), that’s a (pro tanto) reason to use the term “reinforcement”. (OTOH, if you want to reason about RL-that-mostly-involves-model-based-planning, maybe that’s a reason not to!)
For my own writing, I went back and forth a bit, but wound up deciding to stick with textbook terminology (“reward function” etc.), for various reasons including all the usual reasons that using textbook terminology is generally good for communication, plus there’s an irreconcilable jargon-clash around what the specific term “negative reinforcement” means (cf. the behaviorist literature). But I try to be self-aware of situations where people’s intuitions around the word “reward” might be leading them astray, in context, so I can explicitly call it out and try to correct that.
- habryka 3 Feb 2024 17:14 UTC
  4 points
  0
  Parent
  Yeah, not being able to say “negative reward”/”punishment” when you use “reinforcement” seems very costly. I’ve run into that problem a bunch.
  And yeah, that makes sense. I get the “reward implies more model based-thinking” part. I kind of like that distinction, so am tentatively in-favor of using “reward” for more model-based stuff, and “reinforcement” for more policy-gradient based stuff, if other considerations don’t outweigh that.