Reinforcement Via Giving People Cookies
I.
Thinking By The Clock is now the most popular thing I’ve written on LessWrong, so here’s another entry in the list of things which had a significant change in how I think and operate that I learned from a few stray lines of Harry Potter and the Methods of Rationality. It’s quite appropriate of this subject to be the followup you all get because the last one got upvoted so much.
“I was going to give you more space,” said Harry Potter, “only I was reading up on Critch’s theories about hedonics and how to train your inner pigeon and how small immediate positive and negative feedbacks secretly control most of what we actually do, and it occurred to me that you might be avoiding me because seeing me made you think of things that felt like negative associations, and I really didn’t want to let that run any longer without doing something about it, so I got ahold of a bag of chocolates from the Weasley twins and I’m just going to give you one every time you see me as a positive reinforcement if that’s all right with you—”
As far as I can tell this just straightforwardly works.
I hereby propose giving immediate positive feedback for things you want more of, or in simpler words, give people cookies. In my own experience, this really works, and it works on many levels. There are more ways to go astray ethically with negative reinforcement so I am not here making an argument to use that side of the coin, but offering people positive reinforcement seems pretty unobjectionable to me. Reward your friends, reward your enemies, reward yourself!
II.
Lets start with that last point about rewarding yourself.
There’s a particular treat I give myself every time I work out. As soon as I finish the workout, I get the treat. (A fruit smoothie.) This has been going on for years, to the point where my reaction is basically Pavlovian. By the time I finish lacing up my running shoes, I’m already thinking of the reward. Sometimes I’ve noticed an internal urge to go for a run or pick up the weights, and when I trace the source of the urge it’s often that a smoothie sounds good right now. I seem to be unusually good at holding myself to my own rules (most people remark that they could just make the smoothie and not work out, and predict that they would do that instead) but I’m at least n=1 evidence that you can classically condition yourself. But we can go smaller and faster.
There’s this thing I see people do sometimes where they do something and then immediately point out all the flaws with it. It seems like it’s usually people with some kind of anxiety, and I can’t tell which direction the causation goes. They’ll play some new piece on the guitar and as soon as they finish their face scrunches up like they smelled something bad and they point out how many notes they missed on that third line, and then someone else in the room will say something like “oh yeah, I noticed that” and the player will look even more frustrated with themselves. Some amount of this seems useful for the learning process, but the people who can make mistakes and laugh about it seem happier to play more guitar.
I notice this even more when trying to brainstorm or come up with lots of ideas. I’ll watch someone sit silently for while minutes, and then write one idea down. See, what’s going on in my head is that I’m earning points for every idea I come up with, even the bad ones. Another idea, another point. Evaluation of whether it’s a good idea is a separate process and has to be. The points can be awarded very fast and entirely mentally and still have a tiny positive ding of reward. Who doesn’t like points!
“Hermione,” Harry said seriously, as he started to dig down into the red-velvet pouch again, “don’t punish yourself when a bright idea doesn’t work out. You’ve got to go through a lot of flawed ideas to find one that might work. And if you send your brain negative feedback by frowning when you think of a flawed idea, instead of realizing that idea-suggesting is good behavior by your brain to be encouraged, pretty soon you won’t think of any ideas at all.”
Reward yourself. If you punish yourself for trying things and not being perfect, you learn not to try things.
III.
You know what else is fast? Smiling. For a while I was spending a lot of time studying human facial expressions. It felt like every other week I’d run across some news article or another promising positive cheer and easy access to good mood if I smiled more. Experimenting on myself gave mixed results; done without the benefit of complicated scientific measurement apparatus, this boiled down to trying to smile and then trying to figure out if I felt better. That’s practically begging for the placebo effect to come and stomp all over any attempts at data.
The success I found came from being able to identify other people’s smiles. Before I knew how to notice when people were enjoying themselves, I had a much harder time improving a whole range of social skills. After figuring out how to tell (at least most of the time) when people were having a good time, there was a period of rapid improvement where I’d try a joke and realize it didn’t land right or offer to help and be greeted with a genuine smile of relief. That ability to start getting rapid positive feedback was the inflection point for me in becoming a friendly, fun person to be around instead of the irritating arrogant person I was before.
(I’m still kind of arrogant, but reports suggest I’m much less irritating about it.)
You can even do this to other people. See, if I smile when someone offers to help with the other end of the table I’m moving or grin when I see them, they might get a tiny burst of reward. This is good. Especially for friends and partners, this kind of quick feedback is a useful tiny cookie to be around me and help out. Of course, I’m not averse to using actual cookies either. Most of the time when I attend someone else’s rationalist meetup in Boston, I bring fresh baked bread and offer a slice to people early on in the conversation. If I meet someone on a dating app, I often try to find out what food they like before the first or second date and either arrange to meet at a restaurant that serves it or just make the food at home and bring it myself. If I find out that a friend or date hates alcohol and loud noises, why would I take them to a bar?
Reward your friends. If you punish them for coming to hang out with you, they learn hanging out with you means drinking bitter things and getting overwhelmed by noise. (Of course if they like alcohol and loud music, then the bar becomes a better option!)
IV.
Now comes the section I expect to be controversial. I think you should reward your enemies.
To change someone’s mind, it’s useful if they’ll talk to me or at least read the things I write. If talking to me and reading the things I write is unpleasant, they won’t do that as much. Accordingly, if there are unnecessary unpleasant things for them in that conversation, I’m pretty willing to fix that.
Being called rude names or insulted is unpleasant for the vast majority of people. It doesn’t matter if you are right and they are wrong. Since being told that you’re wrong is unpleasant for most people, any conversation in which you want to convince someone they’re wrong starts off in a deficit.
“Thank you for taking the time to talk with me about this!” is usually true, costs you nothing but a few second’s breath, and is nice to hear. Don’t use it solely when you’re about to tell someone they’re wrong, they can and will form associations about that. If there are particular slogans you expect them to have a negative association with, use different words where you can. (The skill of rationalist taboo is useful here.) Don’t yell or resort to cheap memes about how dumb they are.
That’s just avoiding negative reinforcement. When they offer up a crux, reward that a bit by digging into it. When they make a well structured point—a locally valid argument even if you disagree with the conclusion, citing a source whose provenance is good and isn’t taken out of context even if you don’t like the way it’s set up—let them know you think it’s a well structured point. Make engaging with you pleasant and enjoyable.
What if your enemies do more than argue with you? What if they’re threatening violence, or have already spilled blood?
I’m not saying to turn the other cheek. I am suggesting that you want them to find not shooting at you more pleasant than shooting at you. “Lay down arms and surrender, our prisoners of war are well cared for” is actually a compelling pitch in the right circumstances. “Lay down arms and surrender, our torturer chambers are well stocked and this bed of rusty nails is just your size” really isn’t. That’s how you encourage them to fight to the death.
Reward your enemies. If you punish them for leaving you alone or treating with you in good faith, they will be less likely to back down or do anything other than attack.
V.
The kind of reward can be quite flexible. From slivers of shaved chocolate (a small bowl of which sits at my desk during the workday) to cookies (I’m partial to no-bake oatmeal myself) to smiles (oh, so many choices from the shy and hesitant to the fearless and bold) to ceasefire agreements and humanitarian aid (look, I told my maths teacher I was sorry) the important thing is to give positive feedback for things that you want more of, as quickly as possible.
The call to action on this essay is to think more about what behaviors you incentivize in yourself and those around you. Try not to teach people to do things you don’t want to happen by accident. I suggest taking a moment right now to think about what you want more of, and next time you see it happen, to deliberately encourage more of it.
Also uh, LessWrong has the most reasonable metrics around upvotes and engagement metrics of any social site I’m aware of but I do pay some attention to what people are interested in reading. Solving for that equilibrium is left as an exercise for the reader.
Did you use reinforcement to make yourself write a post every weekday?
A little. I like the feeling of correct execution when a phrase comes out that feels right to me. I like upvotes and people commenting saying they liked what I wrote. I also have a chocolate bar by my desk I shave slivers off when I finish something on my todo list.
I read “Thinking By The Clock” in my inbox and ended up here (…and with ten other open tabs to read up on). To apply what I learned about fighting the bystander effect and dispensing cookies: thanks for writing—reading your posts has measurably improved my day!
(Turns out my reading diet was deficient in LessWrong-ium—a necessary nutrient—and incessantly checking Hacker News can be a symptom of LessWrong-ium deficiency.)
You are welcome! Thank you for the kind words :)
I strongly disagreed with all of this!
.
.
.
(cookie please!)
Have an internet cookie for stating there’s a disagreement! Can you elaborate a little more?
I actually fundamentally agree with most/all of it, I just wanted a cookie :)
Can the relevant social norms be compactly formalized? It might sound vaguely like: “Only model people in ways they endorse.”
As a general (though not absolute) rule, I think the answer to “Can the . . . social norms be compactly formalized?” is “no.” People are complicated.
Actually answering the question: I plan to keep modeling people in whatever depth I can. I think having better predictions of what someone is about to do is just straightforwardly useful, even if they don’t endorse it. Is model the word you mean to use there?