[Altruist Support] How to determine your utility function

Follows on from HELP! I want to do good.

What have I learned since last time? I’ve learned that people want to see an SIAI donation; I’ll do it as soon as PayPal will let me. I’ve learned that people want more “how” and maybe more “doing”; I’ll write a doing post soon, but I’ve got this and two other background posts to write first. I’ve learned that there’s a nonzero level of interest in my project. I’ve learned that there’s a diversity of opinions; it suggests if I’m wrong, then I’m at least wrong in an interesting way. I may have learned that signalling low status—to avoid intimidating outsiders—may be less of a good strategy than signalling that I know what I’m talking about. I’ve learned that I am prone to answering a question other than that which was asked.

Somewhere in the Less Wrong archives there is a deeply shocking, disturbing post. It’s called Post Your Utility Function.

It’s shocking because basically no-one had any idea. At the time I was still learning but I knew that having a utility function was important—that it was what made everything else make sense. But I didn’t know what mine was supposed to be. And neither, apparently, did anyone else.

Eliezer commented ‘in prescriptive terms, how do you “help” someone without a utility function?’. This post is an attempt to start to answer this question.

Firstly, what the utility function is and what it’s not. It belongs to the field of instrumental rationality, not epistemic rationality; it is not part of the territory. Don’t expect it to correspond to something physical.

Also, it’s not supposed to model your revealed preferences—that is, your current behavior. If it did then it would mean you were already perfectly rational. If you don’t feel that’s the case then you need to look beyond your revealed preferences, toward what you really want.

In other words, the wrong way to determine your utility function is to think about what decisions you have made, or feel that you would make, in different situations. In other words, there’s a chance, just a chance, that up until now you’ve been doing it completely wrong. You haven’t been getting what you wanted.

So in order to play the utility game, you need humility. You need to accept that you might not have been getting what you want, and that it might hurt. All those little subgoals, they might just have been getting you nowhere more quickly.

So only play if you want to.

The first thing is to understand the domain of the utility function. It’s defined over entire world histories. You consider everything that has happened, and will happen, in your life and in the rest of the world. And out of that pops a number. That’s the idea.

This complexity means that utility functions generally have to be defined somewhat vaguely. (Except if you’re trying to build an AI). The complexity will also allow you a lot of flexibility in deciding what you really value.

The second thing is to think about your preferences. Set up some thought experiments to decide whether you prefer this outcome or that outcome. Don’t think about what you’d actually do if put in a situation to decide between them; then you will worry about the social consequences of making the “unethical” decision. If you value things other than your own happiness, don’t ask which outcome you’d be happier in. Instead just ask, which outcome seems preferable?. Which would you consider good news, and which bad news?

You can start writing things down if you like. One of the big things you’ll need to think about is how much you value self versus everyone else. But this may matter less than you think, for reasons I’ll get into later.

The third thing is to think about preferences between uncertain outcomes. This is somewhat technical, and I’d advise a shut-up-and-multiply approach. (You can try and go against that if you like, but you have to be careful not to end up in weirdness such as getting different answers if you phrase something as one big decision or as a series of identical little decisions).

The fourth thing is to ask whether this preference system satisfies the von Neumann-Morgenstern axioms. If it’s at all sane, it probably will. (Again, this is somewhat technical).

The last thing is to ask yourself: if I prefer outcome A over outcome B, do I want to act in such a way that I bring about outcome A? (continue only if the answer here is “yes”).

That’s it—you now have a shiny new utility function. And I want to help you optimize it. (Though it can grow and develop and change along with yourself; I want this to be a speculative process, not one in which you suddenly commit to an immutable life goal).

You probably don’t feel that anything has changed. You’re probably feeling and behaving exactly the same as you did before. But this is something I’ll have to leave for a later post. Once you start really feeling that you want to maximize your utility then things will start to happen. You’ll have something to protect.

Oh, you wanted to know my utility function? It goes something like this:

It’s the sum of the things I value. Once a person is created, I value that person’s life; I also value their happiness, fun and freedom of choice. I assign negative value to that person’s disease, pain and sadness. I value concepts such as beauty and awesomeness. I assign a large bonus negative value to the extinction of humanity. I weigh the happiness of myself and those close to me more highly than that of strangers, and this asymmetry is more pronounced when my overall well-being becomes low.

Four points: It’s actually going to be a lot more complicated than that. I’m aware that it’s not quantitative and no terminology is defined. I’m prepared to change it if someone points out a glaring mistake or problem, or if I just feel like it for some reason. And people should not start criticizing my behavior for not adhering to this, at least not yet. (I have a lot of explaining still to do).