“Reward” in the context of reinforcement learning is the “goal” we’re training the program to maximise, rather than a literal dopamine hit. For instance, AlphaGo’s reward is winning games of Go. When it wins a game, it adjusts itself to do more of what won it the game, and the other way when it loses. It’s less like the reward a human gets from eating ice-cream, and more like the feedback a coach might give you on your tennis swing that lets you adjust and make better shots. We have no reason to suspect there’s any human analogue to feeling good.
“Reward” in the context of reinforcement learning is the “goal” we’re training the program to maximise, rather than a literal dopamine hit. For instance, AlphaGo’s reward is winning games of Go. When it wins a game, it adjusts itself to do more of what won it the game, and the other way when it loses. It’s less like the reward a human gets from eating ice-cream, and more like the feedback a coach might give you on your tennis swing that lets you adjust and make better shots. We have no reason to suspect there’s any human analogue to feeling good.