I don’t really see the idea of hypotheses trying to prove themselves true. Take the example of saccades that you mention. I think there’s some inherent (or learned) negative reward associated with having multiple active hypotheses (a.k.a. subagents a.k.a. generative models) that clash with each other by producing confident mutually-inconsistent predictions about the same things. So if model A says that the person coming behind you is your friend and model B says it’s a stranger, then that summons model C which strongly predicts that we are about to turn around and look at the person. This resolves the inconsistency, and hence model C is rewarded, making it ever more likely to be summoned in similar circumstances in the future.
You sorta need multiple inconsistent models for it to make sense for one to prove one of them true. How else would you figure out which part of the model to probe? If a model were trying to prevent itself from being falsified, that would predict that we look away from things that we’re not sure about rather than towards them.
OK, so here’s (how I think of) a typical craving situation. There are two active models.
Model A: I will eat a cookie and this will lead to an immediate reward associated with the sweet taste
Model B: I won’t eat the cookie, instead I’ll meditate on gratitude and this will make me very happy
Now in my perspective, this is great evidence that valence and reward are two different things. If becoming happy is the same as reward, why haven’t I meditated in the last 5 years even though I know it makes me happy? And why do I want to eat that cookie even though I totally understand that it won’t make me smile even while I’m eating it, or make me less hungry, or anything?
When you say “mangling the input quite severely to make it fit the filter”, I guess I’m imagining a scenario like, the cookie belongs to Sally, but I wind up thinking “She probably wants me to eat it”, even if that’s objectively far-fetched. Is that Model A mangling the evidence to fit the filter? I wouldn’t really put it that way...
The thing is, Model A is totally correct; eating the cookie would lead to an immediate reward! It doesn’t need to distort anything, as far as it goes.
So now there’s a Model A+D that says “I will eat the cookie and this will lead to an immediate reward, and later Sally will find out and be happy that I ate the cookie, which will be rewarding as well”. So model A+D predicts a double reward! That’s a strong selective pressure helping advance that model at the expense of other models, and thus we expect this model to be adopted, unless it’s being weighed down by a sufficiently strong negative prior, e.g. if this model has been repeatedly falsified in the past, or if it contradicts a different model which has been repeatedly successful and rewarded in the past.
(This discussion / brainstorming is really helpful for me, thanks for your patience.)
If a model were trying to prevent itself from being falsified, that would predict that we look away from things that we’re not sure about rather than towards them.
That sounds like the dark room problem. :) That kind of thing does seem to sometimes happen, as people have varying levels of need for closure. But there seem to be several competing forces going on, one of them being a bias towards proving the hypothesis true by sampling positive evidence, rather than just avoiding evidence.
Model A: I will eat a cookie and this will lead to an immediate reward associated with the sweet taste
Model B: I won’t eat the cookie, instead I’ll meditate on gratitude and this will make me very happy
Now in my perspective, this is great evidence that valence and reward are two different things. If becoming happy is the same as reward, why haven’t I meditated in the last 5 years even though I know it makes me happy? And why do I want to eat that cookie even though I totally understand that it won’t make me smile even while I’m eating it, or make me less hungry, or anything?
This is actually a nice example, because I claim that if you learn and apply the right kinds of meditative techniques and see the craving in more detail, then your mind may notice that eating the cookie actually won’t bring you very much lasting satisfaction (even if it does bring a brief momentary reward)… and then it might gradually shift over to preferring meditation instead. (At least in the right circumstances; motivation is affected by a lot of different factors.)
Which cravings get favored in which circumstances looks like a complex question, that I don’t have a full model of… but we know from human motivation in general that there’s a bias towards actions that bring immediate rewards. To some extent it might be a question of the short-term rewards simply getting reinforced more. Eating a cookie takes less time than meditating for an hour, so if you are more likely to eat more cookies than you finish meditation sessions, each eaten cookie slightly notching up the estimated confidence in the hypothesis and biasing your future decisions even more in favor of the cookie.
The thing is, Model A is totally correct; eating the cookie would lead to an immediate reward! It doesn’t need to distort anything, as far as it goes.
So the prediction that craving makes isn’t actually “eating the cookie will bring reward”; I’m not sure of what the very exact prediction is, but it’s closer to something like “eating the cookie will lead to less dissatisfaction”. And something like the following may happen:
You’re trying to meditate, and happen to think of the cookie on your desk. You get a craving to stop meditating and go eat the cookie. You try to resist the craving, but each moment that you resist it feels unpleasant. Your mind keeps telling you that if you just gave in to the temptation, then the discomfort from resisting it would stop. Finally, you might give in, stopping your meditation session short and going to eat the cookie.
What happened here was that the craving told you that in order to feel more satisfied, you need to give in to the craving. When you did go eat the cookie, this prediction was proven true. But there was a self-fulfilling prophecy there: the craving told you that the only way to eliminate the discomfort was by giving in to the craving, when just dropping the craving would also have eliminated the discomfort. Maybe the craving didn’t exactly distort the sense data, but it certainly sampled a very selected part of it.
The reason why I like to think of cravings as hypotheses, is that if you develop sufficient introspective awareness for the mind to see in real time that the craving is actively generating discomfort rather than helping avoid it, (that particular) craving will be eliminated. The alternative hypothesis that replaces it is then something like “I’m fine even if I go without a cookie for a while”.
Interesting!
I don’t really see the idea of hypotheses trying to prove themselves true. Take the example of saccades that you mention. I think there’s some inherent (or learned) negative reward associated with having multiple active hypotheses (a.k.a. subagents a.k.a. generative models) that clash with each other by producing confident mutually-inconsistent predictions about the same things. So if model A says that the person coming behind you is your friend and model B says it’s a stranger, then that summons model C which strongly predicts that we are about to turn around and look at the person. This resolves the inconsistency, and hence model C is rewarded, making it ever more likely to be summoned in similar circumstances in the future.
You sorta need multiple inconsistent models for it to make sense for one to prove one of them true. How else would you figure out which part of the model to probe? If a model were trying to prevent itself from being falsified, that would predict that we look away from things that we’re not sure about rather than towards them.
OK, so here’s (how I think of) a typical craving situation. There are two active models.
Model A: I will eat a cookie and this will lead to an immediate reward associated with the sweet taste
Model B: I won’t eat the cookie, instead I’ll meditate on gratitude and this will make me very happy
Now in my perspective, this is great evidence that valence and reward are two different things. If becoming happy is the same as reward, why haven’t I meditated in the last 5 years even though I know it makes me happy? And why do I want to eat that cookie even though I totally understand that it won’t make me smile even while I’m eating it, or make me less hungry, or anything?
When you say “mangling the input quite severely to make it fit the filter”, I guess I’m imagining a scenario like, the cookie belongs to Sally, but I wind up thinking “She probably wants me to eat it”, even if that’s objectively far-fetched. Is that Model A mangling the evidence to fit the filter? I wouldn’t really put it that way...
The thing is, Model A is totally correct; eating the cookie would lead to an immediate reward! It doesn’t need to distort anything, as far as it goes.
So now there’s a Model A+D that says “I will eat the cookie and this will lead to an immediate reward, and later Sally will find out and be happy that I ate the cookie, which will be rewarding as well”. So model A+D predicts a double reward! That’s a strong selective pressure helping advance that model at the expense of other models, and thus we expect this model to be adopted, unless it’s being weighed down by a sufficiently strong negative prior, e.g. if this model has been repeatedly falsified in the past, or if it contradicts a different model which has been repeatedly successful and rewarded in the past.
(This discussion / brainstorming is really helpful for me, thanks for your patience.)
That sounds like the dark room problem. :) That kind of thing does seem to sometimes happen, as people have varying levels of need for closure. But there seem to be several competing forces going on, one of them being a bias towards proving the hypothesis true by sampling positive evidence, rather than just avoiding evidence.
This is actually a nice example, because I claim that if you learn and apply the right kinds of meditative techniques and see the craving in more detail, then your mind may notice that eating the cookie actually won’t bring you very much lasting satisfaction (even if it does bring a brief momentary reward)… and then it might gradually shift over to preferring meditation instead. (At least in the right circumstances; motivation is affected by a lot of different factors.)
Which cravings get favored in which circumstances looks like a complex question, that I don’t have a full model of… but we know from human motivation in general that there’s a bias towards actions that bring immediate rewards. To some extent it might be a question of the short-term rewards simply getting reinforced more. Eating a cookie takes less time than meditating for an hour, so if you are more likely to eat more cookies than you finish meditation sessions, each eaten cookie slightly notching up the estimated confidence in the hypothesis and biasing your future decisions even more in favor of the cookie.
So the prediction that craving makes isn’t actually “eating the cookie will bring reward”; I’m not sure of what the very exact prediction is, but it’s closer to something like “eating the cookie will lead to less dissatisfaction”. And something like the following may happen:
You’re trying to meditate, and happen to think of the cookie on your desk. You get a craving to stop meditating and go eat the cookie. You try to resist the craving, but each moment that you resist it feels unpleasant. Your mind keeps telling you that if you just gave in to the temptation, then the discomfort from resisting it would stop. Finally, you might give in, stopping your meditation session short and going to eat the cookie.
What happened here was that the craving told you that in order to feel more satisfied, you need to give in to the craving. When you did go eat the cookie, this prediction was proven true. But there was a self-fulfilling prophecy there: the craving told you that the only way to eliminate the discomfort was by giving in to the craving, when just dropping the craving would also have eliminated the discomfort. Maybe the craving didn’t exactly distort the sense data, but it certainly sampled a very selected part of it.
The reason why I like to think of cravings as hypotheses, is that if you develop sufficient introspective awareness for the mind to see in real time that the craving is actively generating discomfort rather than helping avoid it, (that particular) craving will be eliminated. The alternative hypothesis that replaces it is then something like “I’m fine even if I go without a cookie for a while”.