Another thing I notice after a few years of using this:
The OP says:
Your brain already has the ability to update its cognitive strategies (this is called “meta-cognitive reinforcement learning”). However, the usual mechanism works with unnecessary levels of indirection, as in:
Cognitive strategy → Thought → Action → Reward or punishment
You get rewarded or punished for what you do (as measured by your brain’s chemical responses). Good thoughts are more likely to be followed by good actions. Good cognitive strategies are more likely to generate good thoughts. On average, your brain will slowly update its cognitive strategies in the right direction.
Cognitive strategy → Thought → Reward or punishment
You have learned to be happy or unhappy about having certain ideas, even when you don’t yet know how they apply to the real world. Now your brain gets rewarded or punished for thoughts, and on average good thoughts are more likely to be generated by good cognitive strategies. Your brain can update cognitive strategies faster, according to heuristics about what makes ideas “good”.
However, by carefully looking at the “deltas” between conscious thoughts, we can get rid of the last remaining level of indirection (this is the key insight of this whole page!):
Cognitive strategy → Reward or punishment
You have learned to perceive your cognitive strategies as they happen, and developed some heuristics that tell you whether they are good or bad. Now your brain can update cognitive strategies immediately, and do it regardless of the topic of your thoughts.
Even when you generate a useless idea from another useless idea, you can still track whether the cognitive strategy behind it was sound, and learn from the experience.
I think the author thinks of this as the primary insight here (i.e. getting to: “Cognitive strategy → reward/punishment”). And… I’ll be honest, I think this works and it makes sense to me, but it doesn’t work so obviously that I’m like “yes this underlying theory definitely checked out.”
But what I think is both more obvious, and still a useful stepping stone, is transitioning more from “Cognitive strategy → Thought → Action → Reward or punishment” to “Cognitive strategy → Thought → Reward or punishment”. A lot of my thoughts are obviously dumb (or useful) upon first glance. And shifting how much of my feedback loop happened within ~3 seconds vs longer timescales still seems very helpful.
Another thing I notice after a few years of using this:
The OP says:
I think the author thinks of this as the primary insight here (i.e. getting to: “Cognitive strategy → reward/punishment”). And… I’ll be honest, I think this works and it makes sense to me, but it doesn’t work so obviously that I’m like “yes this underlying theory definitely checked out.”
But what I think is both more obvious, and still a useful stepping stone, is transitioning more from “Cognitive strategy → Thought → Action → Reward or punishment” to “Cognitive strategy → Thought → Reward or punishment”. A lot of my thoughts are obviously dumb (or useful) upon first glance. And shifting how much of my feedback loop happened within ~3 seconds vs longer timescales still seems very helpful.