We Don’t Have a Utility Function
Related: Pinpointing Utility
If I ever say “my utility function”, you could reasonably accuse me of cargo-cult rationality; trying to become more rational by superficially immitating the abstract rationalists we study makes about as much sense as building an air traffic control station out of grass to summon cargo planes.
There are two ways an agent could be said to have a utility function:
-
It could behave in accordance with the VNM axioms; always choosing in a sane and consistent manner, such that “there exists a U”. The agent need not have an explicit representation of U.
-
It could have an explicit utility function that it tries to expected-maximize. The agent need not perfectly follow the VNM axioms all the time. (Real bounded decision systems will take shortcuts for efficiency and may not achieve perfect rationality, like how real floating point arithmetic isn’t associative).
Neither of these is true of humans. Our behaviour and preferences are not consistent and sane enough to be VNM, and we are generally quite confused about what we even want, never mind having reduced it to a utility function. Nevertheless, you still see the occasional reference to “my utility function”.
Sometimes “my” refers to “abstract me who has solved moral philosophy and or become perfectly rational”, which at least doesn’t run afoul of the math, but is probably still wrong about the particulars of what such an abstract idealized self would actually want. But other times it’s a more glaring error like using “utility function” as shorthand for “entire self-reflective moral system”, which may not even be VNMish.
But this post isn’t really about all the ways people misuse terminology, it’s about where we’re actually at on the whole problem for which a utility function might be the solution.
As above, I don’t think any of us have a utility function in either sense; we are not VNM, and we haven’t worked out what we want enough to make a convincing attempt at trying. Maybe someone out there has a utility function in the second sense, but I doubt that it actually represents what they would want.
Perhaps then we should speak of what we want in terms of “terminal values”? For example, I might say that it is a terminal value of mine that I should not murder, or that freedom from authority is good.
But what does “terminal value” mean? Usually, it means that the value of something is not contingent on or derived from other facts or situations, like for example, I may value beautiful things in a way that is not derived from what they get me. The recursive chain of valuableness terminates at some set of values.
There’s another connotation, though, which is that your terminal values are akin to axioms; not subject to argument or evidence or derivation, and simply given, that there’s no point in trying to reconcile them with people who don’t share them. This is the meaning people are sometimes getting at when they explain failure to agree with someone as “terminal value differences” or “different set of moral axioms”. This is completely reasonable, if and only if that is in fact the nature of the beliefs in question.
About two years ago, it very much felt like freedom from authority was a terminal value for me. Those hated authoritarians and fascists were simply wrong, probably due to some fundamental neurological fault that could not be reasoned with. The very prototype of “terminal value differences”.
And yet here I am today, having been reasoned out of that “terminal value”, such that I even appreciate a certain aesthetic in bowing to a strong leader.
If that was a terminal value, I’m afraid the term has lost much of its meaning to me. If it was not, if even the most fundamental-seeming moral feelings are subject to argument, I wonder if there is any coherent sense in which I could be said to have terminal values at all.
The situation here with “terminal values” is a lot like the situation with “beliefs” in other circles. Ask someone what they believe in most confidently, and they will take the opportunity to differentiate themselves from the opposing tribe on uncertain controversial issues; god exists, god does not exist, racial traits are genetic, race is a social construct. The pedant answer of course is that the sky is probably blue, and that that box over there is about a meter long.
Likewise, ask someone for their terminal values, and they will take the opportunity to declare that those hated greens are utterly wrong on morality, and blueness is wired into their very core, rather than the obvious things like beauty and friendship being valuable, and paperclips not.
So besides not having a utility function, those aren’t your terminal values. I’d be suprised if even the most pedantic answer weren’t subject to argument; I don’t seem to have anything like a stable and non-negotiable value system at all, and I don’t think that I am even especially confused relative to the rest of you.
Instead of a nice consistent value system, we have a mess of intuitions and hueristics and beliefs that often contradict, fail to give an answer, and change with time and mood and memes. And that’s all we have. One of the intuitions is that we want to fix this mess.
People have tried to do this “Moral Philosophy” thing before, myself included, but it hasn’t generally turned out well. We’ve made all kinds of overconfident leaps to what turn out to be unjustified conclusions (utilitarianism, egoism, hedonism, etc), or just ended up wallowing in confused despair.
The zeroth step in solving a problem is to notice that we have a problem.
The problem here, in my humble opinion, is that we have no idea what we are doing when we try to do Moral Philosophy. We need to go up a meta-level and get a handle on Moral MetaPhilosophy. What’s the problem? What are the relevent knowns? What are the unknowns? What’s the solution process?
Ideally, we could do for Moral Philosphy approximately what Bayesian probability theory has done for Epistemology. My moral intuitions are a horrible mess, but so are my epistemic intuitions, and yet we more-or-less know what we are doing in epistemology. A problem like this has been solved before, and this one seems solvable too, if a bit harder.
It might be that when we figure this problem out to the point where we can be said to have a consistent moral system with real terminal values, we will end up with a utility function, but on the other hand, we might not. Either way, let’s keep in mind that we are still on rather shaky ground, and at least refrain from believing the confident declarations of moral wisdom that we so like to make.
Moral Philosophy is an important problem, but the way is not clear yet.
- 20 Jun 2014 7:34 UTC; 14 points) 's comment on Against utility functions by (
- 15 Mar 2019 4:00 UTC; 10 points) 's comment on A theory of human values by (
- 19 Jun 2014 11:11 UTC; 3 points) 's comment on Against utility functions by (
- 2 Apr 2013 7:23 UTC; 3 points) 's comment on Solved Problems Repository by (
- 1 May 2015 20:43 UTC; 1 point) 's comment on The paperclip maximiser’s perspective by (
- 5 Apr 2013 1:43 UTC; 0 points) 's comment on The Moral Void by (
- 7 Apr 2013 5:31 UTC; 0 points) 's comment on Utility Quilting by (
Stanovich’s paper on why humans are apparently worse at following the VNM axioms than some animals has some interesting things to say, although I don’t like the way it says them. I quit halfway through the paper out of frustration, but what I got out of the paper (which may not be what the paper itself was trying to say) is more or less the following: humans model the world at different levels of complexity at different times, and at each of those levels different considerations come into play for making decisions. An agent behaving in this way can appear to be behaving VNM-irrationally when really it is just trying to efficiently use cognitive resources by not modeling the world at the maximum level of complexity all the time. Non-human animals may model the world at more similar levels of complexity over time, so they behave more VNM-rationally even if they have less overall optimization power than humans.
A related consideration, which is more about the methodology of studies claiming to measure human irrationality, is that the problem you think a test subject is solving is not necessarily the problem they’re actually solving. I guess a well-known example is when you ask people to play the prisoner’s dilemma but in their heads they’re really playing the iterated prisoner’s dilemma.
And another point: an agent can have a utility function and still behave VNM-irrationally if computing the VNM-rational thing to do given its utility function takes too much time, so the agent computes some approximation of it. It’s a given in practical applications of Bayesian statistics that Bayesian inference is usually intractable, so it’s necessary to compute some approximation to it, e.g. using Monte Carlo methods. The human brain may be doing something similar (a possibility explored in Lieder-Griffiths-Goodman, for example).
(Which reminds me: we don’t talk anywhere near enough about computational complexity on LW for my tastes. What’s up with that? An agent can’t do anything right if it can’t compute what “right” means before the Sun explodes.)
I agree with this concern (and my professional life is primarily focused on heuristic optimization methods, where computational complexity is huge).
I suspect it doesn’t get talked about much here because of the emphasis on intelligence explosion, missing AI insights, provably friendly, normative rationality, and there not being much to say. (The following are not positions I necessarily endorse.) An arbitrarily powerful intelligence might not care much about computational complexity (though it’s obviously important if you still care about marginal benefit and marginal cost at that level of power). Until we understand what’s necessary for AGI, the engineering details separating polynomial, exponential, and totally intractable algorithms might not be very important. It’s really hard to prove how well heuristics do at optimization, let alone robustness. The Heuristics and Biases literature focuses on areas where it’s easy to show humans aren’t using the right math, rather than how best to think given the hardware you have, and some of that may be deeply embedded in the LW culture.
I think that there’s a strong interest in prescriptive rationality, though, and if you have something to say on that topic or computational complexity, I’m interested in hearing it.
Right, this is an important point that could use more discussion.
In closer inspection a lot of the “irrationalities” are either rational on a higher-level game, or to be expected given the inability of people to “feel” abstract facts that they are told.
That said, the inability to properly incorporate abstract information is quite a rationality problem.
I’ve made this point quite a few times here and here
Depends, sometimes this is actually a decent way avoid believing every piece of abstract information one is presented with.
I spent a large chunk of Sunday and Monday finally reading Death Note and came to appreciate how some people on LW can think that agents meticulously working out each other’s “I know that you know that I know” and then acting so as to interact with their simulations of each other, including their simulations of simulating each other, can seem a reasonable thing to aspire to. Even if actual politicians and so forth seem to do it by intuition, i.e., much more in hardware.
Have you ever played that thumb game where you stand around in a circle with some people and at each turn show 0, 1 or 2 thumbs? And each person takes turns calling out a guess for the total number of thumbs that will be shown? Playing that game gives a really strong sense of “Aha! I modeled you correctly because I knew that you knew that I knew …” but I never actually know if it’s real modeling or hindsight bias because of the way the game is played in real time. Maybe there’s a way to modify the rules to test that?
I once spent a very entertaining day with a friend wandering around art exhibits once, with both of us doing a lot of “OK, you really like that and that and that and you hate that and that” prediction and subsequent correction.
One thing that quickly became clear was that I could make decent guesses about her judgments long before I could articulate the general rules I was applying to do so, which gave me a really strong sense of having modeled her really well.
One thing that became clear much more slowly was that the general rules I was applying, once I became able to articulate them, were not nearly as complex as they seemed to be when I was simply engaging with them as these ineffable chunks of knowledge.
I concluded from this that that strong ineffable sense of complex modeling is no more evidence of complex modeling than the similar strong ineffable sense of “being on someone’s wavelength” is evidence of telepathy. It’s just the way my brain feels when it’s applying rules it can’t articulate to predict the behavior of complex systems.
This kind of explicit modelling is a recurring fictional trope.
For example, Herbert uses it a lot in Dosadi Experiment to show off how totes cognitively advanced the Dosadi are.
Yes, but aspiring to it as an achievable thing very much strikes me as swallowing fictional evidence whole. (And, around LW, manga and anime.)
No argument; just citing prior fictional art. :-)
Yes. “(Real bounded decision systems will take shortcuts for efficiency and may not achieve perfect rationality, like how real floating point arithmetic isn’t associative).”
On one hand, a lot of this is lacking a proper theory of logical uncertainty, which a lot of this is (I think).
On the other hand, the usual solution is to step up a level to choose best decision algorithm instead of trying to directly compute best decision. Then you can step up to not taking forever at this. I don’t know how to bottom this out.
Related: A properly built AI need not do any explicit utility maximizing at all; it could all be built implicitly into hardcoded algorithms, the same way most algorithms have implicit probability distributions. Of course, one of the easiest ways to maximize expected utility is to explicitly do so, but I would still expect most code in an optimized AI to be implicitly maximizing.
What you need to estimate for maximizing the utility is not utility but sign of the difference in expected utilities. “More accurate” estimation of utility on one side of the comparison can lead to less accurate estimation of the sign of the difference. Which is what Pascal muggers use.
This is a very good point.
I wonder what the implications are...
The main implication is that actions based on comparison between most complete available estimations of utility do not maximize utility. It is similar to evaluating sums; when evaluating 1-1/2+1/3-1/4 and so on, the 1+1/3+1/5+1/7 is a more complete sum than 1 - you have processed more terms (and can pat yourself on the head for doing more arithmetics) , but less accurate. In practice one obtains highly biased “estimates” from someone putting a lot more effort into finding terms of the sign that benefits them the most, and sometimes, from some terms being easier to find.
Yes, that is a problem.
Are there other schemes that do a better job, though?
In the above example, attempts to produce a most accurate estimate of the sum do a better job than attempts to produce most complete sum.
In general what you learn from applied mathematics is that plenty of methods that are in some abstract sense more distant from the perfect method have a result closer to the result of the perfect method.
E.g. the perfect method could evaluate every possible argument, sum all of them, and then decide. The approximate method can evaluate a least biased sample of the arguments, sum them, and then decide, whereas the method that tries to match the perfect method the most would sum all available arguments. If you could convince an agent that the latter is ‘most rational’ (which may be intuitively appealing because it does resemble the perfect method the most) and is what should be done, then in a complex subject where agent does not itself enumerate all arguments, you can feed arguments to that agent, biasing the sum, and extract profit of some kind.
“Taken together the four experiments provide support for the Sampling Hypothesis, and the idea that there may be a rational explanation for the variability of children’s responses in domains like causal inference.”
That seems to be behind what I suspect is a paywall, except that the link I’d expect to solicit me for money is broken. Got a version that isn’t?
It’s going through a university proxy, so it’s just broken for you. Here’s the paper: http://dl.dropboxusercontent.com/u/85192141/2013-denison.pdf
Notice the obvious implications to the ability of super-human AI’s to behave VNM-rationally.
Which are what? The AI that is managing some sort of upload society could trade it’s clock time for utility.
It’s no different from humans where you can either waste your time pondering if you’re being rational about how jumpy you are when you see a moving shadow that looks sort of like a sabre-toothed tiger, or you can figure out how to tie a rock to a stick; in the modern times, ponder what is a better deal at the store vs try to invent something and make a lot of money.
It still has to deal with the external world.
But the point is, it’s computing time costs utility, and so it can’t waste it on things that will not gain it enough utility.
If you consider 2x1x1 cube to have probability of 1⁄6 of landing on each side, you can still be VNM rational about that—then you won’t be dutch booked, you’ll lose money though because that cube is not a perfect die and you’ll accept losing bets. Real world is like that, it doesn’t give cookies for non-dutch-bookability, it gives cookies for correct predictions of what is actually going to happen.
Confidence in moral judgments is never a sound criterion for them being “terminal”, it seems to me.
To see why, consider that ones working values are unavoidably a function of two related things: one’s picture of oneself, and of the social world. Thus, confident judgments are likely to reflect confidence in relevant parts of these pictures, rather than the shape of the function. To take your example, your adverse judgement of authority could have been a reflection of a confident picture of your ideal self as not being submissive, and of human society at its current state of development as being capable of operating without authority (doubtless oversimplifying greatly, but I hope you get the idea).
A crude mathematical model may help. If M is a vector of your moral values, and S and I is your understanding of society and personal construct respectively, then I am suggesting M = F(S, I). Then the problem is that “terminal values” as I understand them reside in F, but it is only M that is directly accessible to introspection. It is extremely difficult to imagine away the effect of S and I, but one way of making progress should be to vary S & I. That is, try hard to imagine being in an utterly different social context to the one we know. EG: an ancestral hunter-gatherer tribal group; a group of castaways on an island, the remainder being young children; an encounter with aliens; in a group defending ones family against an evil oppressor; etc. etc. Likewise, imagine being in the shoes of somebody with very different aptitudes and personality. The things that remain constant—the things that tell us how to deal with all these different cases—are our terminal values. (Or rather, they would be if we could only eliminate self-deception.)
Excellent suggestion.
I would like to add “Nazi”to that list, and note that if you imagine doing something other than the historical results (in those cases where we know the historical result) you’re doing this wrong.
EDIT: reading this over, it sounds kinda sarcastic. Just want to clarify I’m being sincere here.
Yes indeed, it is a challenge to understand how the same human moral functionality “F” can result in a very different value system “M” to ones own, though I suspect a lot of historical reading would be necessary to fully understand the Nazi’s construction of the social world—“S”, in my shorthand. A contemporary example of the same challenge is the cultures that practice female genital mutilation. You don’t have to agree with a construction of the world to begin to see how it results in the avowed values that emerge from it, but you do have to be able to picture it properly. In both cases, this challenge has to be distinguished from the somewhat easier task of explaining the origins of the value system concerned.
Oh, I didn’t mean it was particularly challenging—at least, as long as you avoid the antipattern of modelling them as Evil Monsters—just that it was a good exercise for this sort of thing. Indeed, I think most people can model the antisemitism (if not the philosophy and rhetorical/emotional power) by imagining society is being subverted by insidious alien Pod People.
Another excellent point.
I don’t know much about FGM or the cultures that practice it, but it might easily be analogous to so-called “male genital mutilation” or circumcision.
Can you describe the process by which you changed your view on authority? I suspect that could be important.
Read lots, think lots, do lots.
More specifically, become convinced of consequentialism so that pragmatic concerns and exceptions can be handled in a principled way, realize that rule by Friendly AI would be acceptable, attempt to actually run a LW meetup and learn of the pragmatic effectiveness of central decision making, notice major inconsistencies in and highly suspect origin of my non-authoritarian beliefs, notice aesthetic value of leadership and following, etc.
“Authority” isn’t necessarily just one thing. For example, an all-powerful Friendly AI could choose to present itself in an extremely deferential way, and even conform exactly to it’s human users’ wishes. Being a central decisionmaker, projecting high status, having impressive accomplishments, having others feel instinctively deferential to you, and having others actually act deferential to you are all distinct but frequently related. I think at least some of these are worrisome (link).
If you increase the authority of a group’s leader along all the dimensions of authority (which probably happens by default), I’d guess you get increased group coherence at the expense of decreased group rationality. You also run the risk of having the leader’s preferences be satisfied at the expense of the group’s preferences. In situations where it doesn’t actually matter what you do much and it mostly just matters that everyone does it together in an orderly way, maybe this can be a good trade-off.
This is interesting. For some time, I’ve had my anti-authoritarianism (and anti-governmentism) sort of filed away in the back of my mind as “review this opinion when I think I can handle finding out I’m wrong” Sounds like you’ve been through the process already.
How much of your change of heart would you attribute to explicit reasoning, aesthetics, and personal experience respectively?
Good question. I wouldn’t say it breaks up so nicely.
First of all, the aesthetic appreciation basically got uncovered when the big aversions went away. It was like, “ok authority can be practical a lot of the time, and oh, look, now that I’m not afraid of it, it’s kind of beautiful after all.”
The personal experience (having been an anarchist, running a LW meetup, etc) mostly just provided a bit of extra verification fuel once anti-authoritarianism was being seriously questioned.
The thing that actually got me to explicitly formulate the whole process was reading Moldbug. He pointed out some glitches in the matrix, so to speak.
I don’t know how to weight the importance of these, or what that would mean. Is there a more specific question you’re interested in?
I think you’re failing to distinguish between authority one voluntarily submits to (potentially even reserving the right to reverse the decision), e.g., meetup organizer, and authority backed by a monopoly on violence, i.e., the modern conception of government.
I hadn’t come across the von Neumann-Morgenstern utility theorem before reading this post, thanks for drawing it to my attention.
Looking at Moral Philosophy through the lens of agents working with utility/value functions is an interesting exercise; it’s something I’m still working on. In the long run, I think some deep thinking needs to be done about what we end up selecting as terminal values, and how we incorporate them into a utility function. (I hope that isn’t stating something that is blindingly obvious.)
I guess where you might be headed is into Meta-Ethics. As I understand it, meta-ethics includes debates on moral relativism that are closely related to the existence terminal/intrinsic values. Moral relativism asserts that all values are subjective (i.e., only the beliefs of individuals), rather than objective (i.e., universally true). So no practice or activity is inherently right or wrong, it is just the perception of people that makes it so. As you might imagine, this can be used as a defense of violent cultural practices (it could even be used in defense of baby-eating).
I tend to agree with the position of moral relativism; unfortunate though it may be, I’m not convinced there are things that are objectively valuable. I’m of the belief that if there are no agents to value something, then that something has effectively no value. That holds for people and their values too. That said, we do exist, and I think subjective values count for something.
Humanity has come to some degree of consensus over what should be valued. Probably largely as a result of evolution and social conditioning. So from here, I think it mightn’t be wasted effort to explore the selection of different intrinsic values.
Luke Muehlhauser has called morality an engineering problem. While Sam Harris has described morality as a landscape, i.e., the surface is our terminal value we are trying to maximize (Harris picked the well-being of conscious creatures) and the societal practices as the variables. Though I don’t know that well-being is the best terminal value, I like the idea of treating morality as an optimization problem. I think this is a reasonable way to view ethics. Without objective values, it might just be a matter of testing different sets of terminal subjective values, until we find the optimum (an hopefully don’t get trapped in a local maximum).
Nevertheless, I think it’s interesting to suppose that something is objectively valuable. It doesn’t seem like a stretch to me to say that the knowledge of what is objectively valuable, would, itself, be objectively valuable. And that the search for that knowledge would probably also be objectively valuable. After all that, it would be somewhat ironic if it turned out the universal objective values don’t include the survival of life on Earth.
The second sentence doesn’t follow from the first. If rational agents converge on their values, that is objective enough. Analogy: one can accept that mathematical truth is objective (mathematicians will converge) without being a Platonists (mathematical truths have an existence separate from humans)
I fin d that hard to follow. If the test i rationally justifiable, and leads to the uniform results, how is that not objective? You seem to be using “objective” (having a truth value independent of individual humans) to mean what I would mean by “real” (having existence independent of humans).
First of all, thanks for the comment. You have really motivated me to read and think about this more—starting with getting clearer on the meanings of “objective”, “subjective”, and “intrinsic”. I apologise for any confusion caused by my incorrect use of terminology. I guess that is why Eliezer likes to taboo words. I hope you don’t mind me persisting in trying to explain my view and using those “taboo” words.
Since I was talking about meta-ethical moral relativism, I hope that it was sufficiently clear that I was referring to moral values. What I meant by “objective values” was “objectively true moral values” or “objectively true intrinsic values”.
The second sentence was an explanation of the first: not logically derived from the first sentence, but a part of the argument. I’ll try to construct my arguments more linearly in future.
If I had to rephrase that passage I’d say:
If there are no agents to value something, intrinsically or extrinsically, then there is also nothing to act on those values. In the absence of agents to act, values are effectively meaningless. Therefore, I’m not convinced that there is objective truth in intrinsic or moral values.
However, the lack of meaningful values in the absence of agents hints at agents themselves being valuable. If value can only have meaning in the presence of an agent, then that agent probably has, at the very least, extrinsic/instrumental value. Even a paperclip maximiser would probably consider itself to have instrumental value, right?
I think there is a difference between it being objectively true that, in certain circumstances, the values of rational agents converge, and it being objectively true that those values are moral. A rational agent can do really “bad” things if the beliefs and intrinsic values on which it is acting are “bad”. Why else would anyone be scared of AI?
I accept the possibility of objective truth values. I’m not convinced that it is objectively true that the convergence of subjectively true moral values indicates objectively true moral values. As far as values go, moral values don’t seem to be as amenable to rigorous proofs as formal mathematical theorems. We could say that intrinsic values seem to be analogous to mathematical axioms.
I’ll have a go at clarifying that passage with the right(?) terminology:
Without the objective truth of intrinsic values, it might just be a matter of testing different sets of assumed intrinsic values until we find an “optimal” or acceptable convergent outcome.
Morality might be somewhat like an NP-hard optimisation problem. It might be objectively true that we get a certain result from a test. It’s more difficult to say that it is objectively true that we have solved a complex optimisation problem.
Thanks for informing me that my use of the term “objective” was confused/confusing. I’ll keep trying to improve the clarity of my communication and understanding of the terminology.
That’s what I like to hear!
But there is no need for morality in the absence of agents. When agents are there, values will be there, when agents are not there, the absence of values doesn’t matter.
I don’t require their values to converge, I require them to accept the truths of certain claims. This happens in real life. People say “I don’t like X, but I respect your right to do it”. The first part says X is a disvalue, the second is an override coming from rationality.
I’m assuming a lot of background in this post that you don’t seem to have. Have you read the sequences, specifically the metaethics stuff?
Moral philosophy on LW is decades (at the usual philosophical pace) ahead of what you would learn elsewhere and a lot of the stuff you mentioned is considered solved or obsolete.
Really? That’s kind of scary if true. Moral philosophy on LW doesn’t strike me as especially well developed (particularly compared to other rationality related subjects LW covers).
I don’t believe anyone’s really taken the metaethics sequence out for a test drive to see if it solves any nontrivial problems in moral philosophy.
Its worse than that. No-one even knows what the theory laid out is. EY says different things in different places.
If I recall correctly it struck me as an ok introduction to metaethics but it stopped before it got to the hard (ie. interesting) stuff.
Moral philosophy is not well developed on LW, but I think it’s further than it is elsewhere, and when I look at the pace of developments in philosophy, it looks like it will take decades for everyone else to catch up. Maybe I’m underestimating the quality of mainstream philosophy, though.
All I know is that people who are interested in moral philosophy who haven’t been exposed to LW are a lot more confused than those on LW. And that those on LW are more confused than they think they are (hence the OP).
What do you think represents the best moral philosophy that LW has to offer?
Just a few months ago you seemed to be saying that we didn’t need to study moral philosophy, but just try to maximize “awesomeness”, which “You already know that you know how to compute”. I find it confusing that this post doesn’t mention that one at all. Have you changed your mind since then, if so why? Or are you clarifying your position, or something else?
The metaethics sequence sinks most of the standard confusions, though it doesn’t offer actual conclusions or procedures.
Complexity of value. Value being human specific. morality as optimization target. etc.
Maybe it’s just the epistemic quality around here though. LWers talking about morality are able to go much further without getting derailed than the best I’ve seen elsewhere, even if there weren’t much good work on moral philosophy on LW.
Right. This is a good question.
For actually making decisions, use Awesomeness or something as your moral proxy, because it more or less just works. For those of us who want to go deeper and understand the theory of morality declaratively, the OP applies; we basically don’t have any good theory. They are two sides of the same coin; the situation in moral philosophy is like the situation in physics a few hundred (progress subjective) years ago, and we need to recognize this before trying to build the house on sand, so to speak. So we are better off just using our current buggy procedural morality.
I could have made the connection clearer I suppose.
This post is actually a sort of precurser to some new and useful (I hope) work on the subject that I’ve written up but haven’t gotten around to polishing and posting. I have maybe 5 posts worth of morality related stuff in the works, and then I’m getting out of this godforsaken dungeon.
Given that we don’t have a good explicit theory of what morality really is, how do you know (and how could you confidently claim in that earlier post) that Awesomeness is a good moral proxy?
I think I understand what you’re saying now, thanks for the clarification. However, my current buggy procedural morality is not “maximize awesomeness” but more like an instinctive version of Bostrom and Ord’s moral parliament.
It seems to fit with intuition. How exactly my intuitions are supposed to imply actual morality is an open question.
Could you nominate some confusions that are unsunk amongst professional philosophers (vis a vis your “decades ahead” claim).
You don’t tend to find much detailed academic discussion regarding metaethical philosophy on the blogosphere at all.
Disclaimers: strictly comparing it to other subjects which I consider similar from an outside view, and supported only by personal experience and observation.
I have, and I found it unclear and inconclusive. A number of people have offered to explain it , and they all ended up bowing out unable to do so
I find no evidence for that claim.
Sorry, I have only read selections of the sequences, and not many of the posts on metaethics. Though as far as I’ve gotten, I’m not convinced that the sequences really solve, or make obsolete, many of the deeper problems or moral philosophy.
The original post, and this one, seems to be running into the “is-ought” gap and moral relativism. Being unable to separate terminal values from biases is due to there being no truly objective terminal values. Despite Eliezer’s objections, this is a fundamental problem for determining what terminal values or utility function we should use—a task you and I are both interested in undertaking.
I think this community vastly over-estimates its grip on meta-ethical concepts like moral realism or moral anti-realism. (E.g. the hopelessly confused discussion in this thread). I don’t think the meta-ethics sequence resolves these sorts of basic issues.
I’m still coming to terms with the philosophical definitions of different positions and their implications, and the Stanford Encyclopedia of Philosophy seems like a more rounded account of the different view points than the meta-ethics sequences. I think I might be better off first spending my time continuing to read the SEP and trying to make my own decisions, and then reading the meta-ethics sequences with that understanding of the philosophical background.
By the way, I can see your point that objections to moral anti-realism in this community may be somewhat motivated by the possibility that friendly AI becomes unprovable. As I understand it, any action can be “rational” if the value/utility function is arbitrary.
There is a lot of diversity of opinions in philosophers and that may be true as a whole of the discipline, there is some good stuff to be found there. I’d recommend staying here for the most part rather than wading through philosophy elsewhere, though.
Also, many moral philosophers may have very different moral sentiments from you and that maybe that makes them seem like idiots more than they actually are. Different moral sentiments as to whether consequentialism rather than just within consequentialism among other things.
One day we’re going to have to unpack “aesthetic” a bit. I think it’s more than just ‘oh it feels really nice and fun’, but after we used it as applied to HPMOR and Atlas Shrugged—or parable fiction in general—I’ve been giving it a similar meaning as ‘mindset’ or ‘way of viewing’. It’s becoming less clear to me as to how to use the term.
I’ve been using it in justifications of reading (certain) fiction now, but I want to be careful that I’m not talking about something else, or something that doesn’t exist, so my rationality can aim true.
What has aesthetics got to do with HPMOR and AS?
Just taboo all the weird terms, and be specific. What are you reading and why?
That’s much more of a qualitative difference based on how much say you have.
Is there a reason you say “terminal value” rather than “intrinsic value”?
It’s the preferred local term. But if Wikipedia is to be believed, it’s also a term used by mainstream philosophers, just less commonly.
I’m not nyan_sandwich, but here’s why I think those are different things and would use the former, not the latter, for what’s being said here. An “intrinsic value” is intrinsic to the thing being valued; e.g., perhaps some beautiful things are beautiful in a way that’s got nothing to do with the particular tastes human beings happen to have, and are just Beautiful In Themselves. A “terminal value” is terminal to the agent doing the valuing; e.g., perhaps my dislike of celery isn’t reducible to any other preferences and principles I have, but just Is What It Is, and other value judgements I make build on my fundamental, irreducible, dislike of celery.
There can be terminal values even if there aren’t intrinsic values (e.g., maybe no value judgement is ever meaningful outside the context of a particular value system, but some value systems really do have things sufficiently axiom-like to be rightly called terminal values). There can be intrinsic values even if there aren’t terminal values (e.g., maybe there is One True Value System but it doesn’t have the sort of logical structure that would make some of its values terminal).
nyan_sandwich is writing about the structure of human beings’ value systems, and suggesting that they don’t involve anything as axiom-like as terminal values. NS is not writing about the objective moral structure of the universe and suggesting that it doesn’t involve ascribing intrinsic value to particular things. Therefore, the proposition NS is endorsing is “we don’t have terminal values”, rather than “things don’t have intrinsic value”.
[EDITED to avoid formatting screwage from multiple underscores.]
On what basis do you assert you were “reasoned out” of that position? For example, what about your change of mind causes you to reject a conversion (Edit: not conversation) metaphor?
Yes, that’s the problem with the conversion metaphor. If reasoning does not cause changes in terminal values, then it seems like terminal values are not real for some sense of real. Yet moral anti-realism feels so incredibly unintuitive.
Edit: The other way you might respond is that you have realized that you still value freedom, but have recently realized it is not a terminal value. But that makes the example less useful in figuring out how actual terminal values work.
TimS mentioned moral anti-realism as one possibility. I have a favorable opinion of desire utilitarianism (search for pros and cons), which is a system that would be compatible with another possibility: real and objective values, but not necessarily any terminal values.
By analogy, such a situation would be a description for moral values like epistemological coherentism (versus foundationalism) describes knowledge. The mental model could be a web rather than a hierarchy. At least it’s a possibility—I don’t intend to argue for or against it right now as I have minimal evidence.
I’ll admit it’s rather shaky and I’d be saying the same thing if I’d merely been brainwashed. It doesn’t feel like it was precipitated by anything other than legitimate moral argument, though. If I can be brainwashed out of my “terminal values” so easily, and it really doesn’t feel like something to resist, then I’d like a sturdier basis on which to base my moral reasoning.
What is a conversation metaphor? I’m afraid I don’t see what you’re getting at.
I still value freedom in what feels like a fundamental way, I just also value hierarchy and social order now. What is gone is the extreme feeling of ickyness attached to authority, and the feeling of sacredness attached to freedom, and the belief that these things were terminal values.
The point is that things I’m likely to identify as “terminal values”, especially in the contexts of disagreements, are simply not that fundamental, and are much closer to derived surface heuristics or even tribal affiliation signals.
I feel like I’m not properly responding to your comment though.
Nyan, I think your freedom example is a little off. The converse of freedom is not bowing down to a leader. It’s being made to bow. People choosing to bow can be beautiful and rational, but I fail to see any beauty in someone bowing when their values dictate they should stand.
My fault for failing to clarify. There are roughly three ways one can talk about changes to an agent’s terminal values.
(1) Such changes never happen. (At a society level, this proposition appears to be false).
(2) Such changes happen through rational processes (i.e. reasoning).
(3) Such changes happen through non-rational processes (e.g. tribal affiliation + mindkilling).
I was using “conversion” as a metaphorical shorthand for the third type of change.
BTW, you might want to change “conversation” to “conversion” in the grandparent.
Ah! Thanks.
Ok. Then my answer to that is roughly this:
This could of course use more detail, unless you understand what I’m getting at.
That’s certainly a serious risk, especially if terminal values work like axioms. There’s a strong incentive in debate or policy conflict to claim an instrumental value was terminal just to insulate it from attack. And then, by process of the failure mode identified in Keep Your Identity Small, one is likely to come to believe that the value actually is a terminal value for oneself.
I took your essay as trying to make a meta-ethical point about “terminal values” and how using the term with an incoherent definition causes confusion in the debate. Parallel to when you said if we interact with an unshielded utility, it’s over, we’ve committed a type error. If that was not your intent, then I’ve misunderstood the essay.
Oops, it wasn’t really about how we use terms or anything. I’m trying to communicate that we are not as morally wise as we sometimes pretend to be, or think we are. That Moral Philosophy is an unsolved problem, and we don’t even have a good idea how to solve it (unlike, say physics, where it’s unsolved, but the problem is understood).
This is in preparation for some other posts on the subject, the next of which will be posted tonight or soon.
That said there has been centuries of work on the subject, that Eliezer unfortunately through out because VHM-utilitarianism is so mathematically elegant.
Are you sure you aren’t simply trading open ended beliefs for those that circularly support themselves to a greater extent? When you trust in an authority which tells you to trust in that authority, that’s sturdier.
Gygax would say your alignment has shifted a step toward Lawful. I tend to prefer the Exalted system, which could represent such a shift through the purchase of a third dot in the virtue of Temperance.
Thanks, it’s a great post.
I wonder if, and when, we should behave as if we were VNM-rational. It seems vital to act VNM-rational if we’re interacting with Omega or for that matter anyone else who is aware of VNM-rationality and capable of creating money pumps. But as you point out we don’t have VNM-utility functions. Therefore, there exist some VNM-rational decisions that will make us unhappy. The big question is whether we can be happy about a plan to change all of our actual preferences so that we become VNM-rational, and if not, is there a way to remain happy while strategically avoiding money pumps and other pitfalls of being not-entirely-VNM-rational.
I expect problems in my VMN-rational future for any preference that is not maximal. If I VNM-prefer A over B then at some point I will have to answer the question “Why should I ever use any resources to increase the probability of B in a lottery between A and B?” At some point I probably won’t do B anymore, although I’ll have VNM-rationally more A to make up for it. Except that I like variety and probably have the circular preference of sometimes liking B more than A, which leads directly to money pumping if some other agent knows exactly how my preferences work. I also occasionally enjoy surprises, but I am pretty sure that is equivalent to having a “specific utility of gambling”.
Perhaps I can use VNM-rationality to achieve the maximal happiness of this non-VNM-rational me. Even if that involves satisfying a lot of circular preferences in a sufficiently pseudo-random order to trigger my desire for variability. But that implies that ultimately I do have a utility function, or at least there exists a utility function that I wish to be maximized. Its preferences are not my actual preferences. I think it’s a type error to even try to compare my preferences with its preferences. It is not my utility that is being maximized but rather the utility of a process that makes the world as awesome and happy for me as possible by VNM-irrationally causing my happiness. If I am not careful in designing this process it will wirehead me or simply tile the universe with my smiling face. But those things would not make me happy. Is there a way to make a VNM-rational utility function care about a VNM-irrational being in such a way that the irrational preferences are satisfied to the greatest extent without violating the wireheading-unhappiness or pointless-tiling-of-the-universe-unhappiness preferences? In my opinion that is the practical question to answer when investigating Moral Philosophy in terms of FAI.
I have not yet come to terms with how constructs of personal identity fit in with having or not having a utility function. What if it makes most sense to model my agency as a continuous limit of a series of ever more divided discrete agents who bring subsequent, very similar, future agents into existence? Maybe each of those tiny-in-time-extent agents have a utility function, and maybe that’s significant?
I think it makes the most sense to learn a ton of cognitive neuroscience and figure out what that neuroscience suggests about how you should model yourself. Kaj Sotala’s mini-sequence about the modular mind seems to be a good place to start.
I think your definition of terminal value is a little vague. The definition I prefer is as follows. A value is instrumental if derives its value from its ability to make other values possible. To the degree that a value is not instrumental, it is a terminal value. Values may be fully instrumental (money), partially instrumental (health [we like being healthy, but it also lets us do other things we like]) or fully terminal (beauty).
Terminal values do not have the warm fuzzy glow of high concepts. Beauty, truth, justice, and freedom may be terminal values, but they aren’t the only ones. They aren’t even the most clear cut examples. One of the clearest examples of a terminal value is sexual pleasure. It is harder to argue it is instrumental to a higher value or more determined on other facts and circumstances than any of the above examples.
Also, how does identifying terminal values help us make choices? We must still chose between our values. If we split our values into terminal and instrumental it will still be rational to chose instrumental values over terminal values sometimes. I’d rather make a million dollars (instrumental value) than a painting short of a masterpiece (terminal value). Identifying values as terminal does not prevent us from having to chose between them either.
Either that or a bias. The difficulty (or even impossibility) of separating out biases from terminal values is the main problem with thinking of oneself as a VNM-utilitarian.
What? How so?
Are there other theories that don’t have this problem?
(for reference, I take VNM seriously but not absolutely, and I don’t take utilitarianism seriously.)
You had an entire post on the subject, you even linked to it in the OP.
I’m not sure. My point was that VNM is not nearly as final a solution to morality as a lot of people around here seem to think.
Sorry, I read your comment as implying that it was a failure of VNM in particular.
The difference between instrumental and terminal values are in the perception of the evaluator. If they believe that something is useful to achieve other values, then it is an instrumental value. If they are wrong about its usefulness, that makes it an error in evaluation, not a terminal value. The difference between instrumental and terminal values is in the map, not in the territory. For someone who believes in astrology, getting their horoscope done is an instrumental value.
In practice this criterion is frequently circular. See also the blue minimizing robot.
First off, I think your observations about terminal values are spot-on, and I was always confused by how little we actually talk about these queer entities known as terminal values.
This discussion reminds me a bit of Scanlon’s What We Owe To Each Other. His formulation of moral discourse strikes me as a piece of Meta-Moral philosophy: ‘An act is wrong if and only if any principle that permitted it would be one that could reasonably be rejected by people moved to find principles for the general regulation of behaviour that others, similarly motivated, could not reasonably reject’, since he seems to be dealing with what we would term terminal values without referring to them as such, but I may be off-base here.
The term “terminal values” kinda assumes a consequentialist meta-ethical framework, I think; and that particular statement (and Scanlon in general) is more on the contractualist side; a framework opposed to consequentialism.
You do have a utility function (though it may be stochastic). You just don’t know what it is. “Utility function” means the same thing as “decision function”; it just has different connotations. Something determines how you act; that something is your utility function, even if it can be described only as a physics problem plus random numbers generated by your free will and adjustments made by God. (God must be encapsulated in an oracle function.) We call it a utility function to clue people into our purposes and the literature that we’re going to draw on for our analysis. If we wished to regard a thing as deterministic rather than as an agent with free will, we would call its decision function a probability density function instead of a utility function.
If you truly have terminal values, they are mainly described by a large matrix of synaptic connections and weights.
When you say “I don’t have a utility function” or “I don’t have terminal values”, you are mostly complaining that approximations are only approximations. You are thinking about some approximation of your utility function or your terminal values, expressed in language or logic, using symbols that conveniently but inaccurately cluster all possible sense-experience vectors into categories, and logical operations that throw away all information but the symbols (and perhaps some statistics, such as a probability or typicality for each symbol).
When we use the words “utility function”, the level of abstraction to use to describe it, and hence its accuracy, depends on the purpose we have in mind. What’s incoherent is talking about “my utility function” absent any such purpose. It’s just like asking “What is the length of the coast of England?”
Whether you have terminal values is a more-complicated question, for uninteresting reasons such as quantum mechanical considerations. The short answer is probably, Any level of abstraction that is simple enough for you to think about, is too simple to capture values that are guaranteed not to change.
Underneath both these questions is the tricky question, “Which me is me?” Are you asking about the utility function enacted by the set of SNPs in your DNA, by your body, or by your conscious mind? These are not the same utility functions. (Whether your conscious mind has a utility function is a tricky question because we would have to separate actions controlled by your conscious mind from actions your body takes not controlled by your conscious mind. If consciousness is epiphenomenal, your mind does not have a useful utility function.)
One common use of terminal values on LW is to try to divine a set of terminal values for humans that can be used to guide an AI. So a specific, meaningful, useful question would be, “Can I discover and describe my terminal values in enough detail that I can be confident that an AI, controlled by these values, will enact the coherent extrapolated volition of these values?” (“Coherent extrapolated volition” may be meaningless, but that’s a separate issue.) I believe the answer is no, which is one reason why I don’t support MIRI’s efforts toward FAI.
Eliezer spent a lot of time years ago explaining in detail why giving an AI goals like “Make humans happy” is problematic, and began to search for the appropriate level of description of goals/values. He unfortunately didn’t pursue this to its conclusion, and chose to focus on errors caused by drift from the original utility function, or by logics that fail to achieve rationality, to the exclusion of consideration of changes caused by the inevitable inexactness of a representation of a utility function and the random component of the original utility function, or of the tricky ontological questions that crop up when you ask, “Whose utility function?”
This contradicts my knowledge. By “utility function”, I mean that thing which VNM proves exists; a mapping from possible worlds to real numbers.
Where are the references for “utility function” being interchangable with “decision algorithm”? I have never seen that stated in any technical discussion of decisions.
I’m confused.
Do you just mean the difference between modeling a thing as an agent, vs modeling it as a causal system?
Can you elaborate on how this relates here?
Agree. Moral philosophy is hard. I’m working on it.
Can you elaborate on why you think it is impossible for a machine to do good things? Or why such a question is meaningless?
Tricky question indeed. Again, working on it.
I have a utility function, but it is not time-invariant, and is often not continuous on the time axis.
And I’m a universe. Just a bit stochastic around the edges...
Universes are like that. Are you deterministic, purely stochastic, or do you make decisions?
What? Not having terminal values means that either you don’t care about anything at all, or that “the recursive chain of valuableness” is infinitely deep. Neither of these seems likely to me.
I think there’s a third possibility: values have a circular, strange-loop structure.
As far as I can tell, you aren’t making any argument for your position that we have utility functions. You are merely asserting it.
“Whenever you do anything, that which determines your action—whatever it may be—can be called a decision—or utility—function. You are doing something, ergo you have a utility function.” [ ]
I have never seen “utility function” used like this in any technical discussion. Am I missing something?
I think Phil is confusing the economist’s (descriptive) utility function, with the VNM-ethicist’s (prescriptive) utility function. Come to think of it, a case could be made that the VNM-utilitarian is similarly confused.
Care to expand on what you mean by VNM-utilitarian? You refer to it a lot and I’m never quite sure what you mean.
(I’m also interested in what you think of it).
By VNM-utilitarianism I mean the moral theories that one should act to maximize a utility function. Around here this is sometimes called “consequentialism” or simply “utilitarianism”. Unfortunately, both terms are ambiguous. It’s possible to have consequentialist theories that aren’t based on a utility function, and “utilitarianism” is also used to mean the theory with the specific utility function of total happiness. Thus, I’ve taken to using “VNM-utilitarianism” as a hopefully less ambiguous and self-explanatory term.
As for what I think of VNM-utilitarianism this comment gives a brief summery.
When it is called ‘utilitarianism’ there are other people who call it wrong. I recommend saying consequentialism to avoid confusion. Mind you, I don’t even know what you mean by those letters (VHM). My best guess is that you mean the Von Neumann Morgenstern utility theorem but got the letters wrong.
If you are referring to those axioms then you could also consider saying VNM-utility instead of VNM-utilitarianism. Because those words have meanings that are far more different than their etymology might suggest to you.
Oops. Fixed.
That’s why I talk about “VNM-utilitarianism” rather than simply “utilitarianism”.
That isn’t enough to disambiguate the meaning. In fact, your intended meaning is not even one of the options to disambiguate between. Your usage is still wrong and misleading. I suggest following nshepperd’s advice and using “VNM-rational” or “VNM-ratinality”.
(Obviously I will be downvoting all comments that persist with “VNM-utilitarianism”. Many others will not downvote but will take your muddled terminology to be strong evidence that you are confused or ill-informed about the subject matter.)
I’m curious, what were the options for what you thought it meant.
How about “VNM-consequentialism”?
Utilitarianism in practice means some kind of aggregation of all people’s preferences. Most typically either ‘total’ or ‘average’. Even though I am a consequentialist (at least in a highly abstract combatibilist sense) I dismiss utilitarianism as stupid, arbitrary and not worth priveleging as a moral hypothesis. Adding VNM to it effectively narrows it down to ‘preference utilitarianism’ which at least gets rid of the worst of the crazy (‘hedonic utilitarianism’ Gahh!). But I don’t think that is what you are trying to refer to when you challenge VNM-X (because it wouldn’t be compatible with the points you make).
Perfect! Please do. ‘Consequentialism’ means what one would naively expect ‘utilitarianism’ to mean, if not for an unfortunate history of bad philosophy having defined the term already. The VNM qualifier then narrows consequentialism down to the typical case that we tend to mean around here (because you are right, technically consequentialism is more broad than just that based on VNM axioms.)
I believe “VNM-utilitarianism” is problematic because it would suggest that it is a kind of utilitarianism. By the most usual definition of “utilitarianism” (a moral theory involving an ‘objective’ aggregative measure of value + utility-maximising decision theory) it is not.
However, I remember “VNM-rational” and “VNM-rationality” being accepted terminology.
No, I don’t think it’s just descriptive vs prescriptive, I mentioned both in my post and asserted that we had neither. Phil is saying that we do have a decision algorithm (I agree), and further, that “utility function” means “decision algorithm” (which I disagree with, but I’m not one to argue terminology.
Economists frequently assume humans have utility functions as part of their spherical cow model of human behavior. Unfortunately, they sometimes forget that this is just a spherical cow model, especially once one gets away from modeling collective economic behavior.
Wasn’t there a whole sequence on this?
yep, but I find that after that, I’m still a long way from actually knowing how to write a program that does moral philosophy, even in principle, whereas physics is much further along (solomonoff induction, etc).
The thermostat in my room doesn’t know what it wants either. However, a utility function models its behaviour pretty well.
Consciousness is the brain’s PR department. If it’s evasive about what it wants, that could be part of an attempt to manipulate others—e.g. see The Folly of Fools: The Logic of Deceit and Self-Deception in Human Life.
Oh, not again! The thermostat is not modelled by a utility function at all. Utility functions are completely irrelevant to understanding a thermostat.
That’s a silly assertion. A thermostat can be trivially modeled using a utility function. Positive utility for decresing temperatue when it is too high, and increasing it when it is too low. Zero utility for other behaviours. This is not a difficult case to understand.
You can also trivially model a thermostat using lego bricks. However, you don’t need a lego-based model to understand a thermostat, it doesn’t lend itself to the task, just like you don’t arbitrarily choose a programming language regardless of your task just because it’s Turing complete.
Nothing about a simple finite state machine like a thermostat that would cause a model-er to go “how can I drag utility functions into this”, even if it is, of course, possible. I’d go so far as to assert that you could (but shouldn’t) model anything that is computable in a way involving a utility function.
That’s a complete straw man. I never claimed that you did. What I said was: “a utility function models its behaviour pretty well”—which is perfectly true.
Any computable agent. If it iisn’t clear how to decompose a system into sensors and actuators, representation in terms of a utility function is not so useful—because it is not unique. It is convenient to use utility functions when you want to compare the values of different agents. If that’s what you are doing, utility functions seem like a suitable tool.
Trivially. Quite. So trivially that anything at all can be “modelled” by a utility function at that level of triviality.
I’ve a great new utility-based model of the universe! The universe as it is has utility 1. Every other hypothetical universe has utility 0.
That’s an Occam’s razor fail, though. Explanations need to be concise to be satisfying. You’ll find that, if you compress that utility function, you will be onto something interesting.
My fake explanation was precisely as concise as yours.