This makes some interesting predictions re: some types of trauma: namely, that they can happen when someone was (probably even correctly!) pushing very hard towards some important goal, and then either they ran out of fuel just before finishing and collapsed, or they achieved that goal and then—because of circumstances, just plain bad luck, or something else—that goal failed to pay off in the way that it usually does, societally speaking. In either case, the predictor/pusher that burned down lots of savings in investment doesn’t get paid off. This is maybe part of why “if trauma, and help, you get stronger; if trauma, and no help, you get weaker”.
Maybe, but that also requires that the other group members were (irrationally) failing to consider that the “attempt could’ve been good even if the luck was bad”.
In human groups, people often do gain (some) reputation for noble failures (is this wrong?)
Sure—I can believe that that’s one way a person’s internal quorum can be set up. In other cases, or for other reasons, they might be instead set up to demand results, and evaluate primarily based on results. And that’s not great or necessarily psychologically healthy, but then the question becomes “why do some people end up one way and other people the other way?” Also, there’s the question of just how big/significant the effort was, and thus how big of an effective risk the one predictor took. Be it internal to one person or relevant to a group of humans, a sufficiently grand-scale noble failure will not generally be seen as all that noble (IME).
Parts of human mind are not little humans. They are allowed to be irrational. It can’t be rational subagents all the way down. Rationality itself is probably implemented as subagents saying “let’s observe the world and try to make a correct model” winning a reputational war against subagents proposing things like “let’s just think happy thoughts”.
But I can imagine how some subagents could have less trust towards “good intentions that didn’t bring actual good outcomes” than others. For example, if you live in an environment where it is normal to make dramatic promises and then fail to act on them. I think I have read some books long ago claiming that children of alcoholic parents are often like that. They just stop listening to promises and excuses, because they have already heard too many of them, and they learned that nothing ever happens. I can imagine that they turn this habitual mistrust against themselves, too. That “I tried something, and it was a good idea, but due to bad luck it failed” resembles too much the parent saying how they had the good insight that they need to stop drinking, but only due to some external factor they had to drink yet another bottle today. Shortly, if your environment fails you a lot, as a response you can become unrealistically harsh on yourself.
Another possible explanation is that different people’s attention is focused on different places. Some people pay more attention to the promises, some pay more attention to the material results, some pay more attention to their feelings. This itself can be a consequence of the previous experience with paying attention to different things.
I wouldn’t say the subsconscious calibrating on more substantial measures of success, such has “how happy something made me” or “how much status that seems to have brought” is irrational. What you’re proposing, it seems to me, is calibrating only on how good of an idea it was from the predictor part / System 2. Which gets calibrated, I would guess, when the person analyses the situation? But if the system 2 is sufficiently bad, calibrating on pure results is a good idea to shield against pursuing some goal, the pursuit of which yields nothing but evaluations of System 2, that the person did well. Which is bad, if one of the end goals of the subconscious is “objective success”.
For example, a situation I could easily imagine myself to have been in: Every day I struggle to go to bed, because I can’t put away my phone. But when I do, at 23:30, I congratulate myself—it took a lot of effort, and I did actually succeed in giving myself enough time to sleep almost long enough.
If I didn’t recalibrate rationally, and “me-who-uses-internal-metrics-of-success” were happy with good effort every day, I’d keep doing it.
All while real me would get fed up soon, and get a screen blocker app to turn on at 23:00 every day to sleep well every day at no willpower cost. (+- the other factors and supposing phone after 23 isn’t very important for some parts of me)
In machine-learning terms, this is the difference between model-free learning (reputation based on success/failure record alone) and model-based learning (reputation can be gained for worthy failed attempts, or lost for foolish lucky wins).
Under that definition you end up saying that what are usually called ‘model-free’ RL algorithms like Q-learning are model-based. E.g. in Connect 4, once you’ve learned that getting 3 in a row has a high value, you get credit for taking actions that lead to 3 in a row, even if you ultimately lose the game.
I think it is kinda reasonable to call Q-learning model-based, to be fair, since you can back out a lot of information about the world from the Q-values with little effort.
Ah, yeah, sorry. I do think about this distinction more than I think about the actual model-based vs model-free distinction as defined in ML. Are there alternative terms you’d use if you wanted to point out this distinction? Maybe policy-gradient vs … not policy-gradient?
Not sure. I guess you also have to exclude policy gradient methods that make use of learned value estimates. “Learned evaluation vs sampled evaluation” is one way you could say it.
Model-based vs model-free does feel quite appropriate, shame it’s used for a narrower kind of model in RL. Not sure if it’s used in your sense in other contexts.
This makes some interesting predictions re: some types of trauma: namely, that they can happen when someone was (probably even correctly!) pushing very hard towards some important goal, and then either they ran out of fuel just before finishing and collapsed, or they achieved that goal and then—because of circumstances, just plain bad luck, or something else—that goal failed to pay off in the way that it usually does, societally speaking. In either case, the predictor/pusher that burned down lots of savings in investment doesn’t get paid off. This is maybe part of why “if trauma, and help, you get stronger; if trauma, and no help, you get weaker”.
Maybe, but that also requires that the other group members were (irrationally) failing to consider that the “attempt could’ve been good even if the luck was bad”.
In human groups, people often do gain (some) reputation for noble failures (is this wrong?)
Sure—I can believe that that’s one way a person’s internal quorum can be set up. In other cases, or for other reasons, they might be instead set up to demand results, and evaluate primarily based on results. And that’s not great or necessarily psychologically healthy, but then the question becomes “why do some people end up one way and other people the other way?” Also, there’s the question of just how big/significant the effort was, and thus how big of an effective risk the one predictor took. Be it internal to one person or relevant to a group of humans, a sufficiently grand-scale noble failure will not generally be seen as all that noble (IME).
Why might it be set up like that? Seems potentially quite irrational. Veering into motivated reasoning territory here imo
Parts of human mind are not little humans. They are allowed to be irrational. It can’t be rational subagents all the way down. Rationality itself is probably implemented as subagents saying “let’s observe the world and try to make a correct model” winning a reputational war against subagents proposing things like “let’s just think happy thoughts”.
But I can imagine how some subagents could have less trust towards “good intentions that didn’t bring actual good outcomes” than others. For example, if you live in an environment where it is normal to make dramatic promises and then fail to act on them. I think I have read some books long ago claiming that children of alcoholic parents are often like that. They just stop listening to promises and excuses, because they have already heard too many of them, and they learned that nothing ever happens. I can imagine that they turn this habitual mistrust against themselves, too. That “I tried something, and it was a good idea, but due to bad luck it failed” resembles too much the parent saying how they had the good insight that they need to stop drinking, but only due to some external factor they had to drink yet another bottle today. Shortly, if your environment fails you a lot, as a response you can become unrealistically harsh on yourself.
Another possible explanation is that different people’s attention is focused on different places. Some people pay more attention to the promises, some pay more attention to the material results, some pay more attention to their feelings. This itself can be a consequence of the previous experience with paying attention to different things.
I wouldn’t say the subsconscious calibrating on more substantial measures of success, such has “how happy something made me” or “how much status that seems to have brought” is irrational. What you’re proposing, it seems to me, is calibrating only on how good of an idea it was from the predictor part / System 2. Which gets calibrated, I would guess, when the person analyses the situation? But if the system 2 is sufficiently bad, calibrating on pure results is a good idea to shield against pursuing some goal, the pursuit of which yields nothing but evaluations of System 2, that the person did well. Which is bad, if one of the end goals of the subconscious is “objective success”.
For example, a situation I could easily imagine myself to have been in: Every day I struggle to go to bed, because I can’t put away my phone. But when I do, at 23:30, I congratulate myself—it took a lot of effort, and I did actually succeed in giving myself enough time to sleep almost long enough. If I didn’t recalibrate rationally, and “me-who-uses-internal-metrics-of-success” were happy with good effort every day, I’d keep doing it. All while real me would get fed up soon, and get a screen blocker app to turn on at 23:00 every day to sleep well every day at no willpower cost. (+- the other factors and supposing phone after 23 isn’t very important for some parts of me)
In machine-learning terms, this is the difference between model-free learning (reputation based on success/failure record alone) and model-based learning (reputation can be gained for worthy failed attempts, or lost for foolish lucky wins).
Under that definition you end up saying that what are usually called ‘model-free’ RL algorithms like Q-learning are model-based. E.g. in Connect 4, once you’ve learned that getting 3 in a row has a high value, you get credit for taking actions that lead to 3 in a row, even if you ultimately lose the game.
I think it is kinda reasonable to call Q-learning model-based, to be fair, since you can back out a lot of information about the world from the Q-values with little effort.
Ah, yeah, sorry. I do think about this distinction more than I think about the actual model-based vs model-free distinction as defined in ML. Are there alternative terms you’d use if you wanted to point out this distinction? Maybe policy-gradient vs … not policy-gradient?
Not sure. I guess you also have to exclude policy gradient methods that make use of learned value estimates. “Learned evaluation vs sampled evaluation” is one way you could say it.
Model-based vs model-free does feel quite appropriate, shame it’s used for a narrower kind of model in RL. Not sure if it’s used in your sense in other contexts.