habryka comments on TurnTrout’s shortform feed

habryka 3 Feb 2024 17:14 UTC
4 points
0
Yeah, not being able to say “negative reward”/”punishment” when you use “reinforcement” seems very costly. I’ve run into that problem a bunch.
And yeah, that makes sense. I get the “reward implies more model based-thinking” part. I kind of like that distinction, so am tentatively in-favor of using “reward” for more model-based stuff, and “reinforcement” for more policy-gradient based stuff, if other considerations don’t outweigh that.