On Akrasia, Habits and Reward Maximization

Content Warning: Discussion of Free Will. Please do not read further if you believe in free will and suspect that this belief is important to your mental well-being.

For most of my life, I’ve struggled with terrible akrasia, like many here. While it wasn’t bad enough to stop me from getting a degree, it did stop me from planning out my life as well as I’d have liked, and made it much harder to be productive after graduation. Recently I’ve made a significant breakthrough that’s dramatically boosted both my productivity and my enjoyment of life in general. I’ve held off on posting about it until now because it’s very easy to think you’ve made a breakthrough, with the seeming benefits only arising from false hope and coincidences, much as how it’s disturbingly easy to get people to swear to benefits from homeopathy. However, at this point, I’ve seen a large enough improvement for long enough that it seems worth posting about it in case it helps other LessWrongers.

I hypothesize that human behavior largely takes place in two modes: habit and plan selection/execution. Habits are primarily the result of a response to a situation resulting in the release of dopamine, which in turn wires the brain to respond to that situation in a similar way again. As CFAR likes to note, this means that habits have triggers, for good or ill, and one can both reduce the temptation of bad habits by avoiding situations that trigger them, and set up Trigger Action Plans to deliberately set off good habits. Perhaps the majority of our behavior is simply one habit chaining into the next, the end of work triggering the start of lunch, the end of a phone call triggering another trip to Reddit.

So far so good, but also so far that’s nothing new to LessWrong, and nothing that’s let me make significant improvements in my life. The breakthrough came when I combined that understanding of habits with the idea that when one isn’t engaged in habitual behavior, one’s actions the rest of the time are set by a reward maximizing mode of automatic plan selection. As per Scott Alexander’s excellent article on the subject, much of human behavior appears to be due to a region of the brain called the striatum determining our decisions. The striatum has a menu of plans and an associated expected reward for each one, encoded in dopamine. This appears to be how lampreys make decisions, and there’s significant reason to believe that humans work much the same way. This is why I included the content warning about free will: it looks suspiciously like the striatum always picks the plan with the highest expected reward, and that willpower doesn’t really enter into the equation. That’s not to say that willpower doesn’t exist; clearly we’ve all used it before^[1], but it looks like it might not change the outcome. Certainly people sometimes do difficult or unpleasant things because they believe them to be worthwhile, but Scott’s article (and Stephen Guyenet’s research that Scott was writing about) make it seem very much like that’s the result of the striatum concluding that the difficult thing will lead to more reward overall, even if there’s something unpleasant along the way.

How do I use triggered habits and my brain’s dopamine maximizing behavior to reduce akrasia?

Part One: Being aware of the habit/reward maximizing dichotomy makes it much easier to stay on the reward maximizing side, which in turn leads me to take better actions. For example, it’s very easy for me to automatically start playing video games after finishing work, or after spending time with a friend. Sometimes this is endorsed, a fun break and way to relax. Other times though, I’d really rather cook, take a shower, work some more on my machine learning project or hit the gym. If I’m not thinking about habits and the fact that I don’t necessarily want them choosing my actions for me, gaming is simply something that happens. If I am thinking about this, I can instead consider what action will actually produce the most reward. Often that’s something other than gaming, and even when I do choose to game, it’s consistently more fun to do so as an endorsed decision rather than a mindless routine.

Part Two: Awareness of what habits I’m engaging in lets me judge whether a habit is rewarding or not. Some habits, like intrusive thoughts of guilt or fear, aren’t rewarding at all, and persist solely through having been ground in over time.^[2] Thus, recognizing their lack of desirability and actively turning my thoughts away from such things^[3] allows me to eliminate intrusive thoughts that had lasted for years until I used this method to overcome them, seemingly very effectively! A similar awareness of habits I judge to be worthwhile makes it much easier for them to be positively reinforced, which has turned programming from something I struggled to work at consistently into a habit in its own right. It’s also resulted in an increased level of insights, which feels like the positive mirror image of intrusive thoughts. Now, instead of constantly feeling afraid or guilty, I’m constantly noticing subtly better ways to communicate with people or to build my projects.

Part Three: Avoiding temptations and increasing productivity through avoiding bad triggers and creating good ones. This is standard LessWrong/CFAR material, and to be honest I haven’t used it much, nor seen much of an improvement from doing so. Nevertheless it’s a potentially-useful application of this idea, so I include it for completeness.

Part Four: Treating my planning abilities as adding options to my striatum, rather than as creating a plan that I can try to execute, and end up wondering why I can’t make myself do it. This helps me make a point of creating plans that are rewarding, and that I’ll be motivated to actually enact. It also means that I let my striatum choose whether and when to engage in a plan, rather than forcing it, which in turn prevents building up an ugh field around that plan. While one might wonder if just letting my striatum choose when I take action would result in my procrastinating, in practice I procrastinate much less this way, as if there’s genuinely a reason why something needs to be done now, awareness of that fact makes it much more rewarding to do the thing now.

Part Five: This is heavily dependent on creating rewarding plans, so what does that look like in practice? I consider why I want to do something in the first place, and the links between potential actions and the results that I want to attain. For example, I have long wanted to learn the technical details of modern neural nets, both out of sheer curiosity and a desire to start working in AI alignment. Prior to adopting this method, this desire resulted in a vague sense that I ought to look up something neural net related, followed by not actually doing so, followed by feeling bad about not doing so, followed by associating the bad feeling with neural nets and being even less likely to look up anything the next time. Now, my approach is to consider what I need specifically (a source of the next piece of information I don’t yet know, networking to help find the next source of information after that, gaining mentors, meeting people who can help me find funding), consider what actions would lead to that and how specifically that works, followed by taking those actions because I want those results. Making this work ties back into consciously noticing habits and judging whether or not they’re rewarding. That way, I notice that the part of me saying “ugh, this sounds hard, just go take a walk instead” isn’t actually in my interests and let it fade before I actually give up and take a walk, and that the part saying “hey, the Keras fit algorithm is actually really interesting, let’s figure out precisely which parts of it we don’t understand yet and see what Stack Overflow has to say on the topic” is very much in my interests and let it grow until I’ve properly studied the topic, and genuinely enjoyed doing so.

Part Six: This view of the brain suggests that emotions are reactions not to current conditions but to anticipated improvement or stagnation, and that continued effort to optimize will be rewarded with continued happiness. The fact that any static set of conditions, no matter how good, seems to be unable to provide lasting happiness is of course the notorious Hedonic Treadmill, and the fact that getting richer, more popular, etc. can temporarily alleviate it isn’t much comfort, given the fact that one eventually reaches a point where it really isn’t possible for the moment to get more money/status/sex/whatever, at which point the Treadmill can set in and strip all the enjoyment out of whatever you have! However, this model makes me suspect that so long as you continue to exert effort to optimize your position, the Treadmill will not set in, even if the position itself does not change. Marrying the most beautiful woman in the world would get boring. Marrying the most beautiful woman in the world and continuing to take an interest in what she’s doing, how she’s thinking, building up the relationship even though you can’t marry her again… that, I suspect, will not get old.

This has helped me enough that I wanted to share it, despite the difficulty of explaining personal mental routines well enough to be understood. If even a few people reading this get similar benefits from this method, so much the better! The habit awareness and judgement part of the method sounds similar enough to cognitive behavioral therapy that I wonder if I’ve inadvertently rediscovered CBT, though it’s different from CBT as I’ve heard it explained (the version I heard was to respond to negative thoughts by telling yourself the opposite message ten or fifteen times, which I’ve experimented with, and found no benefit whatsoever from doing so).

^
Well, most of us, given the variety of human experience I wouldn’t be entirely surprised to get a comment from someone reporting that they never use willpower.
^
Given that habits seem to form due to being rewarding the first time or first few times, that raises the question of how non-rewarding habits can exist. My hypothesis is that such habits form due to the brain expecting them to be the lesser evil, such as a fear habit being promoted due to the brain expecting fear to be less unpleasant than the feared event coming to pass. Thus, such habits can be rewarded initially and persist on sheer autopilot, but if I recognize explicitly that the fear is actually making things worse, the habit stops being positively reinforced and fades.
^
An important note about “turning ones thoughts away”: for many people, myself included, simply trying not to think about something actually increases unwanted thoughts. As the old joke goes, try not to think about pink elephants, now what are you thinking about? It’s more like learning how to stand up straight. You don’t freak out if you catch yourself slumping, you just stand up as best you can, move on, and correct your posture again if you find yourself slumping again.