I think the quotes cited under “The field of RL thinks reward=optimization target” are all correct. One by one:
The agent’s job is to find a policy… that maximizes some long-run measure of reinforcement.
Yes, that is the agent’s job in RL, in the sense that if the training algorithm didn’t do that we’d get another training algorithm (if we thought it was feasible for another algorithm to maximize reward). Basically, the field of RL uses a separation of concerns, where they design a reward function to incentivize good behaviour, and the agent maximizes that function. I think this is sensible, because it’s relatively easier to think “what reward function represents what I want out of this agent” than “how do I achieve this difficult task”.
In instrumental conditioning, animals learn to choose actions to obtain rewards and avoid punishments, or, more generally to achieve goals. Various goals are possible, such as optimizing the average rate of acquisition of net rewards (i.e. rewards minus punishments), or some proxy for this such as the expected sum of future rewards.
This describes some possible goals, and I don’t see why you think the goals listed are impossible (and don’t think they are).
We hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward.
This makes sense. RL selects agents that approximately maximize reward. Intelligence uncontroversially helps agents do that. When agents do smart thinking, they probably get reinforced (at least for the right kinds of smart thinking).
I perceive you as saying “These statements can make sense.” If so, the point isn’t that they can’t be viewed as correct in some sense—that no one sane could possibly emit such statements. The point is that these quotes are indicative of misunderstanding the points of this essay. That if someone says a point as quoted, that’s unfavorable evidence on this question.
This describes some possible goals, and I don’t see why you think the goals listed are impossible (and don’t think they are).
I wasn’t implying they’re impossible, I was implying that this is somewhat misguided. Animals learn to achieve goals like “optimizing… the expected sume of future rewards”? That’s exactly what I’m arguing against as improbable.
I’m not saying “These statements can make sense”, I’m saying they do make sense and are correct under their most plain reading.
Re: a possible goal of animals being to optimize the expected sum of future rewards, in the cited paper “rewards” appears to refer to stuff like eating tasty food or mating, where it’s assumed the animal can trade those off against each other consistently:
Decision-making environments are characterized by a few key concepts: a state space..., a set of actions..., and affectively important outcomes (finding cheese, obtaining water, and winning). Actions can move the decision-maker from one state to another (i.e. induce state transitions) and they can
produce outcomes. The outcomes are assumed to have numerical (positive or negative) utilities, which can change according to the motivational state of the decision-maker (e.g. food is less valuable to a satiated animal) or direct experimental manipulation (e.g. poisoning)...
In instrumental conditioning, animals learn to choose actions to obtain rewards and avoid punishments, or, more generally to achieve goals. Various goals are possible, such as optimizing the average rate of acquisition of net rewards (i.e. rewards minus punishments), or some proxy for this such as the expected sum of future rewards[.]
It seems totally plausible to me that an animal could be motivated to optimize the expected sum of future rewards in this sense, given that ‘reward’ is basically defined as “things they value”. It seems like the way this would be false would be if animals rewards are super unstable, or the animal doesn’t coherently trade off things they value. This could happen, but I don’t see why I should see it as overwhelmingly likely.
[EDIT: in other words, the reason the paper conflates ‘rewards’ with ‘optimization target’ is that that’s how they’re defining rewards]
I’m not saying “These statements can make sense”, I’m saying they do make sense and are correct under their most plain reading.
Yup, strong disagree with that.
“rewards” appears to refer to stuff like eating tasty food or mating, where it’s assumed the animal can trade those off against each other consistently:
If that were true, that would definitely be a good counterpoint and mean I misread it. If so, I’d retract my original complaint with that passage. But I’m not convinced that it’s true. The previous paragraph just describes finding cheese as an “affectively important outcome.” Then, later, “outcomes are assumed to have numerical… utilities.” So they’re talking about utility now, OK. But then they talk about rewards. Is this utility? It’s not outcomes (like finding cheese), because you can’t take the expected sum of future finding-cheeses—type error!
When I ctrl+F rewards and scroll through, and it sure seems like they’re talking about dopamine or RPE or that-which-gets-discounted-and-summed-to-produce-the-return, which lines up with my interpretation.
dopamine or RPE or that-which-gets-discounted-and-summed-to-produce-the-return
Those are three pretty different things—the first is a chemical, the second I guess stands for ‘reward prediction error’, and the third is a mathematical quantity! Like, you also can’t talk about the expected sum of dopamine, because dopamine is a chemical, not a number!
Here’s how I interpret the paper: stuff in the world is associated with ‘rewards’, which are real numbers that represent how good the stuff is. Then the ‘return’ of some period of time is the discounted sum of rewards. Rewards represent ‘utilities’ of individual bits of time, but the return function is the actual utility function over trajectories. ‘Predictions of reward’ means predictions of stuff like bits of cheese that is associated with reward. I do think the authors do a bit of equivocation between the numbers and the things that the numbers represent (which IMO is typical for non-mathematicians, see also how physicists constantly conflate quantities like velocity with the functions that take other physical quantities and return the velocity of something), but given that AFAICT my interpretation accounts for the uses of ‘reward’ in that paper (and in the intro). That said, there are a bunch of them, and as a fallible human I’m probably not good at finding the uses that undermine my theory, so if you have a quote or two in mind that makes more sense under the interpretation that ‘reward’ refers to some function of a brain state rather than some function of cheese consumption or whatever, I’d appreciate you pointing them out to me.
I think the quotes cited under “The field of RL thinks reward=optimization target” are all correct. One by one:
Yes, that is the agent’s job in RL, in the sense that if the training algorithm didn’t do that we’d get another training algorithm (if we thought it was feasible for another algorithm to maximize reward). Basically, the field of RL uses a separation of concerns, where they design a reward function to incentivize good behaviour, and the agent maximizes that function. I think this is sensible, because it’s relatively easier to think “what reward function represents what I want out of this agent” than “how do I achieve this difficult task”.
This describes some possible goals, and I don’t see why you think the goals listed are impossible (and don’t think they are).
This makes sense. RL selects agents that approximately maximize reward. Intelligence uncontroversially helps agents do that. When agents do smart thinking, they probably get reinforced (at least for the right kinds of smart thinking).
I perceive you as saying “These statements can make sense.” If so, the point isn’t that they can’t be viewed as correct in some sense—that no one sane could possibly emit such statements. The point is that these quotes are indicative of misunderstanding the points of this essay. That if someone says a point as quoted, that’s unfavorable evidence on this question.
I wasn’t implying they’re impossible, I was implying that this is somewhat misguided. Animals learn to achieve goals like “optimizing… the expected sume of future rewards”? That’s exactly what I’m arguing against as improbable.
I’m not saying “These statements can make sense”, I’m saying they do make sense and are correct under their most plain reading.
Re: a possible goal of animals being to optimize the expected sum of future rewards, in the cited paper “rewards” appears to refer to stuff like eating tasty food or mating, where it’s assumed the animal can trade those off against each other consistently:
It seems totally plausible to me that an animal could be motivated to optimize the expected sum of future rewards in this sense, given that ‘reward’ is basically defined as “things they value”. It seems like the way this would be false would be if animals rewards are super unstable, or the animal doesn’t coherently trade off things they value. This could happen, but I don’t see why I should see it as overwhelmingly likely.
[EDIT: in other words, the reason the paper conflates ‘rewards’ with ‘optimization target’ is that that’s how they’re defining rewards]
Yup, strong disagree with that.
If that were true, that would definitely be a good counterpoint and mean I misread it. If so, I’d retract my original complaint with that passage. But I’m not convinced that it’s true. The previous paragraph just describes finding cheese as an “affectively important outcome.” Then, later, “outcomes are assumed to have numerical… utilities.” So they’re talking about utility now, OK. But then they talk about rewards. Is this utility? It’s not outcomes (like finding cheese), because you can’t take the expected sum of future finding-cheeses—type error!
When I ctrl+F rewards and scroll through, and it sure seems like they’re talking about dopamine or RPE or that-which-gets-discounted-and-summed-to-produce-the-return, which lines up with my interpretation.
Those are three pretty different things—the first is a chemical, the second I guess stands for ‘reward prediction error’, and the third is a mathematical quantity! Like, you also can’t talk about the expected sum of dopamine, because dopamine is a chemical, not a number!
Here’s how I interpret the paper: stuff in the world is associated with ‘rewards’, which are real numbers that represent how good the stuff is. Then the ‘return’ of some period of time is the discounted sum of rewards. Rewards represent ‘utilities’ of individual bits of time, but the return function is the actual utility function over trajectories. ‘Predictions of reward’ means predictions of stuff like bits of cheese that is associated with reward. I do think the authors do a bit of equivocation between the numbers and the things that the numbers represent (which IMO is typical for non-mathematicians, see also how physicists constantly conflate quantities like velocity with the functions that take other physical quantities and return the velocity of something), but given that AFAICT my interpretation accounts for the uses of ‘reward’ in that paper (and in the intro). That said, there are a bunch of them, and as a fallible human I’m probably not good at finding the uses that undermine my theory, so if you have a quote or two in mind that makes more sense under the interpretation that ‘reward’ refers to some function of a brain state rather than some function of cheese consumption or whatever, I’d appreciate you pointing them out to me.