Yeah, not being able to say “negative reward”/”punishment” when you use “reinforcement” seems very costly. I’ve run into that problem a bunch.
And yeah, that makes sense. I get the “reward implies more model based-thinking” part. I kind of like that distinction, so am tentatively in-favor of using “reward” for more model-based stuff, and “reinforcement” for more policy-gradient based stuff, if other considerations don’t outweigh that.
Yeah, not being able to say “negative reward”/”punishment” when you use “reinforcement” seems very costly. I’ve run into that problem a bunch.
And yeah, that makes sense. I get the “reward implies more model based-thinking” part. I kind of like that distinction, so am tentatively in-favor of using “reward” for more model-based stuff, and “reinforcement” for more policy-gradient based stuff, if other considerations don’t outweigh that.