This is part 29 of 30 in the Hammertime Sequence. Click here for the intro.
I find myself dragging my feet on the last couple days of each Hammertime cycle. From this and several other data points, I think current my writing attention span is around a week, and drafts and outlines sitting for more than a week feel too stale to finish. Had I known this in advance, I would probably have structured Hammertime as six 5-day sprints.
Reinforcement Learning?
What happens when reinforcement learning isn’t enough?
You playing a game of Go against sensei. On move twenty-four, sensei invades your three-space extension with devastating precision, cutting a group you thought was safe into two scattered dragons. The left dragon tries to run away, but sensei cuts its escape route off with a delicate leaning attack on your corner enclosure. It dies with abandon.
The right dragon, now facing the massive wall sensei built up by attacking the left group, tries frantically to make life locally. Its second eye is poked out unceremoniously by a well-placed tesuji. Because of your struggle, sensei has fifty points of territory and thickness radiating across the entire board. You resign.
What is a novice supposed to learn from a game like this? If your teacher leaves you to your own devices to review the game, you might easily conclude any of the following, if not a dozen other things:
Don’t make three space extensions.
Never try to run away.
Do not respond to leaning moves.
Sacrifice early.
Study life and death.
Let’s say you learn lesson 1, don’t make three space extensions. The next week’s teaching game, you dutifully plod out two spaces from each approach. Sensei’s stones are balanced and efficient while yours are over-concentrated and unimaginative. You lose handily by points.
What happens now? Do you return to three-space extensions, frustrated with two-space ones?
Over-correction and Learning Stopsigns
The Strategic Level is a CFAR flash class about learning strategically: updating in such a way that will actually prevent the same failure modes in the future. The kind of learning above is definitely not strategic.
As I see it, there’s two common and overlapping kinds of failure modes in learning, where the lessons learned can be worse than nothing.
The first kind is over-correction:
Had an argument: “I should be more understanding.”
Had a panic attack: “I should just care less about everything.”
Was a White Knight at Dragon Army: “I should just never trust human beings.”
Lost a Go game: “I should never make three-space jumps.”
Such overly general lessons can be cures worse than the disease. As your simple strategies progressively fail, you need to come up with and try more and more complicated strategies. You can’t just continually bounce between two extremes, refusing to stare the complexity of reality in the face.
The second type of failure is similarly unproductive:
I should have just read out that dan-level life and death problem!
I should have just studied chapter 3 instead of chapter 2!
I should have just tried to use the polynomial method on that problem!
I call these thoughts learning stopsigns. A common type of learning stopsign is of the form “should have done so and so,” where so and so is some arbitrary, brilliant, unreasonable choice you would never have made in advance. Just as semantic stopsigns masquerade as answers, learning stopsigns masquerade as lessons learned while not actually providing practical utility for the future.
The learning stopsign simply says: turn back, nothing to see here, painful thoughts past this point. It’s usually accompanied by a nonchalant shrug.
Strategic Learning
What does it mean to learn strategically?
Whenever you fail, try to answer the question, “What way of thinking would I have had to employ to have caught this problem ahead of time?” Every lesson learned is a chance to tune your cognitive strategies to prevent as wide a class of similar problems as possible in the future.
At very least, learn to recognize unproductive over-correction and to drive past learning stopsigns. When you encounter a failure and make a snap judgment about what went wrong, ask yourself: is it any less likely I’ll fail in the same way again?
Exercise: Set a Yoda Timer and meditate on your most recent mistakes.
Daily Challenge
Share a story of a cure that was worse than the disease.
Not too long ago my girlfriend once said a few things I found hurtful. A few days later I decided to talk things through with her. Unfortunately that day she was in a rather bad mood for different reasons (which I hadn’t fully comprehended until that point), which caused the talk to derail a bit and become more hurtful, different from the past when having these meta relationship talks always worked rather well.
My reaction to this initially was to assume she had just changed over time and had somehow “lost her empathy”—so basically fundamental attribution error, assuming that’s apparently how she is now rather than she’s having a bad day and maybe we should talk about this some other time. My over-correction was to decide just not to have such talks anymore and keep stuff to myself. Fortunately though she sorted things out later and I quickly made sure to unlearn that lesson.
One meta lesson I learned from this is that I personally tend to over-correct on any kind of negative evidence in relationships and probably social situations in general.
A while back in high school, a talented acquaintance of mine started promoting their music before it was good. They did the whole nine yards—bought fake social media followers, created their own fan pages, bought ads, a photoshoot, etc. They would not stop talking about their upcoming success in the music industry. Almost a decade later, they are working odd jobs, hoping to “blow up”.
The lesson I took from that (back then) was “do not promote until you have the finished product. Do not talk about what you do until it’s good enough, just put your head down and get better.” I spent almost a decade making unfinished project after unfinished project, unwilling to release or promote them because they weren’t “good enough yet”. I way overcompensated.
In this case, the cure was worse than the disease (even though I greatly improved) because putting out bad music wouldn’t mean that I’d be known as a person who puts out bad music—I’d be known as “a person who puts out music”, which is a valuable thing. I’d be much more prone to positive black swan events from having my name out there. Plus, it’s not like I would stop improving—I could’ve easily had the best of both worlds.
I’m now aiming at the synthesis of those two views—being humble and diligent about improving while being okay with putting out imperfect things. 80% of 1000 > 100% of 0.