I’m not sure if this is the right post in the sequence to ask this question on but: how does your model explain the differences in effects of different reinforcement schedules? Perhaps there’s some explanation of them already in the literature, but I’ve always wondered why, for instance, variable ratio scheduling is so much more motivating than fixed ratio scheduling.
I haven’t read the literature on that, but it’s always fun to speculate off the top of my head. Here goes :)
You’re deciding whether or not to pull the lever.
In a 5%-win slot machine (variable-rate schedule), if you pull the lever, there’s a probability distribution for what will happen next, and that distribution has 5% weight on “immediate reward”. Maybe that’s sufficiently motivating to press the lever. (See Section 5.5.6.1 above.)
In a win-every-20-presses machine (fixed-rate schedule), there are 20 different scenarios (depending on how close you are to the next reward). Probably the least motivating of those 20 scenarios is the one where you just won and you’re 20 lever-presses away from the next win. Now, the probability distribution for what happens after the next press has 0% weight on “immediate reward”. Instead, you might concoct the plan “I will press the lever 20 times and then I’ll 100% get a reward”. But that plan might not be sufficiently motivating, because it gets penalized by the boring exertion required, and the reward doesn’t count for as much because it’s distant in time.
So then I would say: a priori, it’s not obvious which one would be more motivating, but there’s no reason to expect them to be equally motivating. The winner depends on several innately-determined parameters like how steep the hyperbolic time-discounting is, and exactly how does the brain collapse the reward prediction probability distribution into a decision. And I guess that, throughout the animal kingdom, these parameters are such that the 5%-win slot machine is more motivating. ¯\_(ツ)_/¯
My vague, uneducated intuition on the matter is that it has something to do with surprise. More specifically, that a pleasant event that is unexpected is intrinsically higher valence / more rewarding, for some reason, than a pleasant event that is expected. I don’t know why this would be the case or how it works in the brain but it fits with my life experience pretty well and likely yours too. (In the same way, an unexpected bad event feels far worse than an expected bad event in most cases.)
Then a fixed rate schedule is such that the entity will quickly learn to predict each reward and will find it less rewarding—meanwhile in a variable rate schedule, the rewards are harder to predict and thus more compelling.
But that just pushes the question backwards a bit: why is unpredictability of an event a multiplicative factor in the equation determining its reward, magnifying highs and lows? What evolutionary purpose does that serve if it is true and how is it implemented in the brain? I’m not sure.
Hmm, maybe this (if accurate) is how curiosity and risk-aversion are implemented? Heck, maybe they’re both the same drive, an emergent result of this amplification that uncertainty hypothetically causes: since unexpected rewards are more rewarding, entities will seek out environments in which unexpected good events are more likely to occur, e.g. novel environments (but not so novel that they are predicted to be unsafe) - meanwhile, entities will avoid environments in which unexpected bad events are likely to occur, and will tend to minimize risk. (Meaning that your prediction about the valence of novel things in general has a large effect on whether it is more or less compelling than familiar things, leading to the balance of sensitivities between good versus bad surprises being a hyperparameter perhaps differing between individuals—bear versus bull etc.) But that’s all just conjecture.
I’m not sure if this is the right post in the sequence to ask this question on but: how does your model explain the differences in effects of different reinforcement schedules? Perhaps there’s some explanation of them already in the literature, but I’ve always wondered why, for instance, variable ratio scheduling is so much more motivating than fixed ratio scheduling.
I haven’t read the literature on that, but it’s always fun to speculate off the top of my head. Here goes :)
You’re deciding whether or not to pull the lever.
In a 5%-win slot machine (variable-rate schedule), if you pull the lever, there’s a probability distribution for what will happen next, and that distribution has 5% weight on “immediate reward”. Maybe that’s sufficiently motivating to press the lever. (See Section 5.5.6.1 above.)
In a win-every-20-presses machine (fixed-rate schedule), there are 20 different scenarios (depending on how close you are to the next reward). Probably the least motivating of those 20 scenarios is the one where you just won and you’re 20 lever-presses away from the next win. Now, the probability distribution for what happens after the next press has 0% weight on “immediate reward”. Instead, you might concoct the plan “I will press the lever 20 times and then I’ll 100% get a reward”. But that plan might not be sufficiently motivating, because it gets penalized by the boring exertion required, and the reward doesn’t count for as much because it’s distant in time.
So then I would say: a priori, it’s not obvious which one would be more motivating, but there’s no reason to expect them to be equally motivating. The winner depends on several innately-determined parameters like how steep the hyperbolic time-discounting is, and exactly how does the brain collapse the reward prediction probability distribution into a decision. And I guess that, throughout the animal kingdom, these parameters are such that the 5%-win slot machine is more motivating. ¯\_(ツ)_/¯
My vague, uneducated intuition on the matter is that it has something to do with surprise. More specifically, that a pleasant event that is unexpected is intrinsically higher valence / more rewarding, for some reason, than a pleasant event that is expected. I don’t know why this would be the case or how it works in the brain but it fits with my life experience pretty well and likely yours too. (In the same way, an unexpected bad event feels far worse than an expected bad event in most cases.)
Then a fixed rate schedule is such that the entity will quickly learn to predict each reward and will find it less rewarding—meanwhile in a variable rate schedule, the rewards are harder to predict and thus more compelling.
But that just pushes the question backwards a bit: why is unpredictability of an event a multiplicative factor in the equation determining its reward, magnifying highs and lows? What evolutionary purpose does that serve if it is true and how is it implemented in the brain? I’m not sure.
Hmm, maybe this (if accurate) is how curiosity and risk-aversion are implemented? Heck, maybe they’re both the same drive, an emergent result of this amplification that uncertainty hypothetically causes: since unexpected rewards are more rewarding, entities will seek out environments in which unexpected good events are more likely to occur, e.g. novel environments (but not so novel that they are predicted to be unsafe) - meanwhile, entities will avoid environments in which unexpected bad events are likely to occur, and will tend to minimize risk. (Meaning that your prediction about the valence of novel things in general has a large effect on whether it is more or less compelling than familiar things, leading to the balance of sensitivities between good versus bad surprises being a hyperparameter perhaps differing between individuals—bear versus bull etc.) But that’s all just conjecture.