Said Achmiz comments on Underappreciated points about utility functions (of both sorts)

Said Achmiz 4 Jan 2020 18:18 UTC
6 points

But if we’re stating idealized preferences (including a moral theory), then these idealized preferences had better be consistent—and not literally just consistent, but obeying rationality axioms to avoid stupid stuff.

“Stupid” by what standard?

Surely it can’t be “by the standard of your preferences”—because your preferences are what’s being evaluated. What, then? Is there some other standard? But why should we accept this standard, if it is not contained within, or implied by, our preferences?

It is not at all obvious to me that the quoted claim is coherent or true. I should like to see it argued for, explicitly.

Relatedly:

So if your moral theory is given by an unbounded utility function, then it is not, in fact, a correct description of anyone’s idealized preferences, no matter how much you insist it is, because you’re saying that people’s idealized (not real!) preferences are, essentially, inconsistent.

Suppose I am saying exactly this. And what of it? What’s wrong with this?

an incosistent [sic] (or technically consistent but having obvious perversities) moral system is no good

Why not?

In other words: suppose that we have a choice between a moral system that fails to properly capture all of our preferences (but is consistent/etc.), and a moral system which is inconsistent (or perverse etc.), but captures all of our preferences. You say that we should obviously choose the former. Why? This choice does not seem obvious to me.
- cousin_it 4 Jan 2020 19:42 UTC
  3 points
  Parent
  Building FAI that will correctly optimize something inconsistent seems like an even more daunting task than building FAI :-)
  - Said Achmiz 5 Jan 2020 3:30 UTC
    3 points
    Parent
    That’s hardly a reason to modify our morality…!
    - cousin_it 5 Jan 2020 11:05 UTC
      2 points
      Parent
      Sure, but that’s a reason to research consistent values that are close to ours, so we have something to program into a certain kind of FAI. That’s why people research “idealizing values”, and I think it’s a worthwhile direction. Figuring out how to optimize inconsistent values could be another direction, they are not mutually exclusive.
      - Said Achmiz 5 Jan 2020 11:25 UTC
        2 points
        Parent
        I think that’s a very dangerous direction. It seems like it would be all too easy for judgments of value ‘closeness’ to be made on the basis of possibility/convenience/etc. (i.e., “how easy it would be to program this into an FAI”), rather than… unbiased evaluation.
        
        Furthermore, it seems to me that if you take any set of values “close to” your own, and then optimize for those values, that optimization itself will make these values less and less close to yours. (This would be especially true if there is no practical/meaningful way to optimize your actual values!)
        
        These two things put together, which complement (in a negative and dangerous way) each other, make me very leery of the “research consistent values that are close to ours” approach.
        
        cousin_it 5 Jan 2020 11:46 UTC
        3 points
        Parent
        I think it makes sense to worry about value fragility and shoehorning, but it’s a cost-benefit thing. The benefits of consistency are large: it lets you prove stuff. And the costs seem small to me, because consistency requires nothing more than having an ordering on possible worlds. For example, if some possible world seems ok to you, you can put it at the top of the ordering. So assuming infinite power, any ok outcome that can be achieved by any other system can be achieved by a consistent system.
        
        And even if you want to abandon consistency and talk about messy human values, OP’s point still stands: unbounded utility functions are useless. They allow “St Petersburg inconsistencies” and disallow “bounded inconsistencies”, but human values probably have both.
        
        Said Achmiz 5 Jan 2020 12:13 UTC
        2 points
        Parent
        
        consistency requires nothing more than having an ordering on possible worlds. For example, if some possible world seems ok to you, you can put it at the top of the ordering. So assuming infinite power, any ok outcome that can be achieved by any other system can be achieved by a consistent system
        
        This is an interesting point. I will have to think about it, thanks.
        
        And even if you want to abandon consistency and talk about messy human values, OP’s point still stands: unbounded utility functions are useless.
        
        To be clear, I take no position on this point in particular. My disagreements are as noted in my top-level comment—no more nor less. (You might say that I am questioning various aspects of the OP’s “local validity”. The broader point may stand anyway, or it may not; that is to be evaluated once the disagreements are resolved.)
- Mark_Friedenbach 5 Jan 2020 4:28 UTC
  2 points
  Parent
  Preferences cannot be inconsistent, only stated preferences which are not the same thing. So that seems like an impossible choice...
  - Said Achmiz 5 Jan 2020 4:41 UTC
    2 points
    Parent
    
    Preferences cannot be inconsistent
    
    This is the first I’m hearing of this. As far as I’m aware, in all previous discussions on LW, inconsistent preferences have been taken to be, not only possible, but actual (and possessed by most humans). On what do you base your claim?
    - Mark_Friedenbach 5 Jan 2020 16:50 UTC
      4 points
      Parent
      I’m responding here rather than deeper in the thread. It’s not the whole response I wanted to do, which probably deserves an entire sequence, but it gets the ball rolling at least.
      My whole quote was:
      Preferences cannot be inconsistent, only stated preferences which are not the same thing.
      Let’s break that down a bit. First of all I’m taking “preferences” to be shorthand for “preferences over outcomes,” which is to say a qualitative ranking of possible future worlds: E.g. getting no ice cream is bad, me getting ice cream is good, everyone getting ice cream is better, etc.
      To quantify this, you can assign numeric values to outcomes in order of increasing preference. Now we have a utility function that scores outcomes. But despite adding numbers we are really just specifying a ranked distribution of outcomes based on preference. U(no ice cream) < U(I have ice cream) < U(everyone has ice cream).
      Now what does it mean for preferences to be inconsistent? The simplest example would be the following:
      U(no ice cream) < U(I have ice cream)
      U(I have ice cream) < U(everyone has ice cream)
      U(everyone has ice cream) < U(no ice cream)
      If you’ve followed closely so far, alarm bells should be going off. This supposed ordering is nonsensical. No ice cream is worse than just me having ice cream, which is worse than everyone having ice cream… which is worse than no ice cream at all? We’ve looped around! There’s no way to maximize this ordering in order to find the “best” outcome. All outcomes are “better” than all other outcomes, and at the same time “worse” as well.
      Inconsistent utility functions (“preferences”) are nonsensical. You can’t reason about them. You can’t use them to solve problems. And when you look at them in detail it doesn’t really make sense as a preference either: how can one outcome be strictly better than another, and also strictly worse at the same time? Make up your mind already!
      But of course when we consider a real problem like, say, abortion rights, our preferences feel conflicted. I would argue this is just what it feels like on the inside to have a weighted utility function, with some terms positive and some terms negative. If we consider the case of abortion, we may feel bad for the terminated potential human represented by the fetus, but feel good about the woman’s control of their body or the prevention of an unhappy childhood stemming from an unwanted pregnancy. We also know that every real abortion circumstance is a complex jumble of a bunch of other factors, far too many to write here or even consider in one sitting in real life. So what we end up feeling is:
      (positive term) + (negative term) + (...ugh...)
      Most people seek reasons to ignore either the positive or negative terms so they can feel good or righteous about themselves for voting Blue or Green. But I would argue that in an issue which feels unresolved and conflicted, the term which dominates in magnitude is really the …ugh… term. We can feel preferences for and against, but we are unable to weigh the balance because the gigantic unresolved pool of …ugh… we have yet to wade into.
      What this feels like from the inside is that your preferences are inconsistent: you feel simultaneous pull towards or against multiple outcomes, without a balance of weight to one side or the other.
      And yes like you mentioned in another comment what I think you should do is delve deeper and explore that …ugh… field. You don’t have to uncover literally every piece of evidence, but you do have to keep at it until it is balanced by the revealed positive and negative terms, as which point you will no longer feel so conflicted and your preferences will be clear.
      Now inconsistent preferences often show up when considering things like virtue ethics: “Stealing is wrong. We want to be good people, so we should never steal.” But what if I need to steal bread to feed my starving family? Providing for my family is also virtuous, is it not? “Inconsistency!” you shout.
      No. I defy that notion. Nobody is born with a strong moral instinct against such a complex social virtual such as a prohibition of theft, which requires a concept of other minds, a notion of property, and parent values of fairness and equality. At best you are perhaps born with a spectrum of hedonic likes and dislikes, a few social instincts, and based on child development trajectories, maybe also some instinctual values relating to fairness and social status. As you grow up you develop and/or have imprinted upon you some culturally transmitted terminal values, as well as the non-linear weights to these terminal values which make up your own personal utility function. Since you have to make choices day-to-day in which you evaluate outcomes, you also develop a set of heuristics for maximizing your utility functions: these are the actual reasons you have for the things you do, and include both terminal and instrumental values as well as functional heuristics. To an even smaller extent you also have a meta-level awareness of these goals and heuristics, which upon request you might translate into words. These are your stated preferences.
      In circumstances that you are likely to encounter as a rationalist (e..g no cult reprogramming), seeking reflective equilibrium should cause your instrumental goals and functional heuristics to more closely match your underlying utility function without changing the weights, thereby not altering your actual preferences, even if you do occasionally change your stated preferences as a result.
      This comment has already gotten super long, and I’ve run out of time. There’s more I wanted to say about on a more mathematical and AI basis about how seeking reflective equilibrium must always create instrumental values which reflect stable terminal values under reflection. But that would be a post equally long and more math heavy...
      What links here?
      Said Achmiz's comment on Underappreciated points about utility functions (of both sorts) by Sniffnoy (6 Jan 2020 0:26 UTC; 2 points)
      - Said Achmiz 5 Jan 2020 20:07 UTC
        2 points
        Parent
        You are talking, here, about preferences that are intransitive.
        
        The von Neumann–Morgenstern utility theorem specifies four axioms which an agent’s preferences must conform to, in order for said preferences to be formalizable as a utility function. Transitivity of preferences is one of these axioms.
        
        However, the VNM theorem is just a formal mathematical result: it says that if, and only if, an agent’s preferences comply with these four axioms, then there exists (up to positive affine transformation) a utility function which describes these preferences.
        
        The axioms are often described as rules that a “rational agent” must comply with, or as being axioms of “rationality”, etc., but this is a tendentious phrasing—one which is in no way implicit in the theorem (which, again, is only a formally proved result in mathematics), nor presupposed by the theorem. Whether compliance with the VNM axioms is normative (or, equivalently, whether it constitutes, or is required by, “rationality”) is thus an open question.
        
        (Note that whether the actual preferences of existing agents (i.e., humans) comply with the VNM axioms is not an open question—we know that they do not.)
        
        It may interest you to know that, of the four VNM axioms, transitivity is one which I (like you) find intuitively and obviously normative. I cannot see any good reason to have preferences that are intransitive upon reflection; this would be clearly irrational.
        
        But there are three other axioms: independence, continuity, and completeness. I do not find any of those three to be obviously normative. In fact, there are good reasons to reject each of the three. And my actual preferences do indeed violate at least the independence and continuity axioms.
        
        If you search through my comment history, you will find discussions of this topic dating back many years (the earliest, I think, would have been around 2011; the most recent, only a few months ago). My opinion has not materially shifted, over this period; in other words, my views on this have been stable under reflection.
        
        Thus we have the situation I have been describing: my preferences are “inconsistent” in a certain formal sense (namely, they are not VNM-compliant), and thus cannot be represented with a utility function. This property of my preferences is stable under reflection, and furthermore, I endorse it as normative.
        
        P.S.: There are certain other things in your comment which I disagree with, but, as far as I can tell, all are immaterial to the central point, so I am ignoring them.
        
        Mark_Friedenbach 6 Jan 2020 0:16 UTC
        2 points
        Parent
        Note that whether the actual preferences of existing agents (i.e., humans) comply with the VNM axioms is not an open question—we know that they do not.
        I defy the data. Give me a hard example please, or I don’t think there’s much benefit to continuing this.
        Said Achmiz 6 Jan 2020 0:26 UTC
        2 points
        Parent
        Certainly I can do this (in fact, you can find several examples yourself by, as I said, looking through my comment history—but yes, I’m willing to dig them up for you).
        
        But before I do, let me ask: what sorts of examples will satisfy you? After all, suppose I provide an example; you could then say: “ah, but actually this is not a VNM axiom violation, because these are not your real preferences—if you thought about it rationally, you would conclude that your real preferences should instead be so-and-so” (in a manner similar to what you wrote in your earlier comment). Then suppose I say “nope; I am unconvinced; these are definitely my real preferences and I refuse to budge on this—my preferences are not up for grabs, no matter what reasoning you adduce”. Then what? Would you, in such a case, accept my example as an existence proof of my claim? Or would you continue to defy the data?
        
        Mark_Friedenbach 6 Jan 2020 0:37 UTC
        2 points
        Parent
        Well I don’t know how I would react without seeing it, which is why I’m asking. But yes my better-odds expectation is that it will only be apparently inconsistent and we’d either be able to unravel the real underlying terminal values or convincingly show that the ramifications of the resulting inconsistency are not compatible with your preferences. If you think that’d be a waste of your time you’re free not to continue with this, with no assumed fault of course.
        Said Achmiz 6 Jan 2020 1:40 UTC
        2 points
        Parent
        Well, let’s say this: I will take some time (when I can, sometime within the next few days) to find some of the comments in question, but if it turns out that you do think that none of the claimed examples are sufficient, then I make no promises about engaging with the proposed “unraveling of real underlying terminal values” or what have you—that part I do think is unlikely to be productive (simply because there is usually not much to say in response to “no, these really are my preferences, despite any of these so-called ‘contradictions’, ‘incompatibilities’, ‘inconsistencies’, etc.”—in other words, preferences are, generally, prior to everything else^[1]).
        
        In the meantime, however, you might consider (for your own interest, if nothing else) looking into the existing (and quite considerable) literature on VNM axiom violations in the actual preferences of real-world humans. (The Wikipedia page on the VNM theorem should be a good place to start chasing links and citations for this.)
        
        ↩︎
        This, of course, avoids the issue of higher-order preferences, which I acknowledge is an important complicating factor, but which I think ought to be dealt with as a special case, and with full awareness of what exactly is being dealt with. (Robin Hanson’s curve-fitting approach is the best framework I’ve seen for thinking about this sort of thing.)
      - TAG 11 Jan 2020 17:50 UTC
        1 point
        Parent
        You are proving that if preferences are well-defined , they also need to be consistent.
        
        What does it feel like from the inside to have badly defined preferences? Presumably it feels like sometimes being unable to make decisions, which you report is the case.
        
        You can’t prove that preferences are consistent without first proving they are well defined.
    - Mark_Friedenbach 5 Jan 2020 5:27 UTC
      2 points
      Parent
      The search term you want is “reflective equilibrium,” which reduces our beliefs about our preferences to a fixed point under the transform of incrementally resolving inconsistencies. The output is necessarily a single consistent set of preferences in practice. (In theory there could be a cyclic group representing an unresolvable inconsistency, but there are reasonable priors that can be chosen to avoid.)
      - Said Achmiz 5 Jan 2020 7:19 UTC
        2 points
        Parent
        Yes, I’m familiar with the term. But what of it? Just because I can run some procedure (whether once or iteratively) on my preferences and get some output, which may be some altered set of preferences… doesn’t mean… well, anything. Yes, I can do this, if I were so inclined. (Or, alternatively, I could also… not do this.) What of it? How does that mean that I currently don’t have the preferences that I do, in fact, have? How does it prevent my current, actual preferences from being inconsistent?
        
        Mark_Friedenbach 5 Jan 2020 8:00 UTC
        2 points
        Parent
        Seeking reflective equilibrium isn’t meant to change your preferences. It is meant to refine or alter cached thoughts, which play a role in the production of stated preferences.
        
        E.g. if in ask ‘is it okay to kill someone?’ And you say “no, never” with conviction. Then I follow with ‘what about self-defense?’ And you reply “ok, in self defense or the defense of others, but only if there is no other option.” Did your preferences change?
        
        What I’m arguing is that you didn’t change your preferences, but rather updated your stated preferences based on a cache flush I initiated with my line of questioning.
        
        Said Achmiz 5 Jan 2020 8:49 UTC
        2 points
        Parent
        I see. But then, whence this claim:
        
        The output is necessarily a single consistent set of preferences in practice.
        
        This doesn’t seem necessary at all, to me. Why do you say it is?
        
        In fact, in practice—to take an immediate example—the output of reflection in my case has been to demonstrate that my preferences do not conform to the VNM axioms (and therefore cannot be represented with a utility function). Indeed this reflection process did not change my preferences, as you say. And yet the output was not ‘consistent’ in the way we’re discussing!
        
        Would you say that I just haven’t reflected enough, and that further reflection would reveal that actually, my real preferences are, and have always been, ‘consistent’ in this way? But how would we verify this claim? (How much more reflection is ‘enough’?) Or how else would you resolve this apparent falsification of your claim?
        
        Mark_Friedenbach 5 Jan 2020 12:15 UTC
        2 points
        Parent
        I would like to give you a longer response, but I’m on the go and about to enter a long week of work meetings. Remind me if you don’t get a longer reply (and you still care).
        
        I think it would help though to clarify: what do you mean by: “feels inconsistent?“ I hope it is okay to ask you a short question about the meaning of a common word :) It would help to have an example.
        
        Said Achmiz 5 Jan 2020 13:21 UTC
        2 points
        Parent
        Er, sorry, but I didn’t use the phrase “feels inconsistent” (nor any other construction involving “feel”)… what are you referring to?
        
        Mark_Friedenbach 5 Jan 2020 13:50 UTC
        2 points
        Parent
        Sorry that was sloppy of me:
        
        And yet the output was not ‘consistent’ in the way we’re discussing!
        
        Said Achmiz 5 Jan 2020 15:19 UTC
        2 points
        Parent
        Oh—I wasn’t saying anything new there; I was just referring back to the first sentence of that paragraph:
        
        In fact, in practice—to take an immediate example—the output of reflection in my case has been to demonstrate that my preferences do not conform to the VNM axioms (and therefore cannot be represented with a utility function).
- Said Achmiz 4 Jan 2020 18:21 UTC
  2 points
  Parent
  Devil’s advocacy:
  
  One answer to the above might be “we have a meta-preference to have a consistent morality”.
  
  Well, fair enough, if so. However, if that is the only answer—if this this our only reason for preferring the consistent-but-inaccurate moral system to the accurate-but-inconsistent one—then we ought to get clear on this fact, first. Having our choice in such a dilemma be driven only by a meta-preference, and not by any other considerations, is a special case, and must be unambiguously identified before we attempt to resolve the issue.
  What links here?
  - Said Achmiz's comment on Underappreciated points about utility functions (of both sorts) by Sniffnoy (6 Jan 2020 1:40 UTC; 2 points)
  - Mark_Friedenbach 5 Jan 2020 5:53 UTC
    2 points
    Parent
    We have to make choices, and it is not possible to make choices to maximize outcomes if we don’t have a consistent utility function. Since having a consistent utility function is a hard requirement of simply being an agent and having any effect on the world, I think it’s a reasonable requirement to have.
    
    People say they have inconsistent utility functions. But then they go and make real life decisions anyway, so their actions imply a consistent utility function. Actions speak louder than words...
    - Said Achmiz 5 Jan 2020 7:16 UTC
      2 points
      Parent
      
      having a consistent utility function is a hard requirement of simply being an agent and having any effect on the world
      
      I don’t know what you mean by this; it seems plainly false. I have effects on the world all the time (as do most people), and I don’t, as far as I can tell, have a consistent utility function (nor do most people).
      
      People say they have inconsistent utility functions. But then they go and make real life decisions anyway, so their actions imply a consistent utility function.
      
      But just the fact of making decisions doesn’t imply a utility function. What can you mean by this…?