ETA: This is a meta comment about some aspects of some comments on this post and what I perceive to be problems with the sort of communication/thinking that leads to the continued existence of those aspects. This comment is not meant to be taken as a critique of the original post.
ETA2: This comment lacks enough concreteness to act as a serious consideration in favor of one policy over another. Please disregard it as a suggestion for how LW should normatively respond to something. Instead one might consider if one might personally benefit from enacting a policy I might be suggesting, on an individual basis.
Why are people on Less Wrong still talking about ‘their’ ‘values’ using deviations from a model that assumes they have a ‘utility function’? It’s not enough to explicitly believe and disclaim that this is obviously an incorrect model, at some point you have to actually stop using the model and adopt something else. People are godshatter, they are incoherent, they are inconsistent, they are an abstraction, they are confused about morality, their revealed preferences aren’t their preferences, their revealed preferences aren’t even their revealed preferences, their verbally expressed preferences aren’t even preferences, the beliefs of parts of them about the preferences of other parts of them aren’t their preferences, the beliefs of parts of them aren’t even beliefs, preferences aren’t morality, predisposition isn’t justification, et cetera…
Can we please avoid using the concept of a human “utility function” even as an abstraction, unless it obviously makes sense to do so? If you’re specific enough and careful enough it can work out okay (e.g. see JenniferRM’s comment) but generally it is just a bad idea. Am I wrong to think this is both obviously and non-obviously misleading in a multitude of ways?
Don’t you think people need to go through an “ah ha, there is such a thing as rationality, and it involves Bayesian updating and expected utility maximization” phase before moving on to “whoops, actually we don’t really know what rationality is and humans don’t seem to have utility functions”? I don’t see how you can get people to stop talking about human utility functions unless you close LW off from newcomers.
I don’t see how you can get people to stop talking about human utility functions unless you close LW off from newcomers.
I was pretty happy before LW, until I learnt about utility maximization. It tells me that I ought to do what I don’t want to do on any other than some highly abstract intellectual level. I don’t even get the smallest bit of satisfaction out of it, just depression.
Saving galactic civilizations from superhuman monsters burning the cosmic commons, walking into death camps as to reduce the likelihood of being blackmailed, discounting people by the length of their address in the multiverse...taking all that seriously and keeping one’s sanity, that’s difficult for some people.
What LW means by ‘rationality’ is to win in a hard to grasp sense that is often completely detached from the happiness and desires of the individual.
It tells me that I ought to do what I don’t want to do on any other than some highly abstract intellectual level. I don’t even get the smallest bit of satisfaction out of it, just depression.
If this is really having that effect on you, why not just focus on things other than abstract large-scale ethical dilemmas, e.g. education, career, relationships? Progress on those fronts is likely to make you happier, and if you want to come back to mind-bending ethical conundrums you’ll then be able to do so in a more productive and pleasant way. Trying to do something you’re depressed and conflicted about is likely to be ineffective or backfire.
Yeah, I have found that when my mind breaks, I have to relax while it heals before I can engage it in the same sort of vigorous exercise again.
It’s important to remember that that’s what is going on. When you become overloaded and concentrate on other things, you are not neglecting your duty. Your mind needs time to heal and become stronger by processing the new information you’ve given it.
Hmm… I think I’ve slipped into “defending a thesis” mode here. The truth is that the comment you replied to was much too broad, and incorrect as stated, as you correctly pointed out. Thanks for catching my error!
You are right, it depends on the specifics. And if you focus on other things with no plan to ever return to the topic that troubled you, that’s different. But if you’ve learned things that make demands on your mind beyond what it can meet, then failing to do what is in fact impossible for you is not negligence.
Gosh, recurring to jsteinhart’s comment everything should add up to normality . If you feel that you’re being led by abstract reasoning in directions that feel consistently feel wrong then there’s probably something wrong with the reasoning. My own interest in existential risk reduction is that when I experience a sublime moment I want people to be able to have more of them for a long time. If all that there was was a counterintuitive abstract argument I would think about other things.
If you feel that you’re being led by abstract reasoning in directions that feel consistently feel wrong then there’s probably something wrong with the reasoning.
Yup, my confidence in the reasoning here on LW and my own ability to judge it is very low. The main reason for this is described in your post above, taken to its logical extreme you end up doing seemingly crazy stuff like trying to stop people from creating baby universes rather than solving friendly AI.
I don’t know how to deal with this. Where do I draw the line? What are the upper and lower bounds? Are risks from AI above or below the line of uncertainty that I better ignore, given my own uncertainty and the uncertainty in the meta-level reasoning involved?
I am too uneducated and probably not smart enough to figure this out, yet I face the problems that people who are much more educated and intelligent than me devised.
If a line of reasoning is leading you to do something crazy, then that line of reasoning is probably incorrect. I think that is where you should draw the line. If the reasoning is actually correct, then by learning more your intuitions will automatically fall in line with the reasoning and it will not seem crazy anymore.
In this case, I think your intuition correctly diagnoses the conclusion as crazy. Whether you are well-educated or not, the fact that you can tell the difference speaks well of you, although I think you are causing yourself way too much anxiety by worrying about whether you should accept the conclusion after all. Like I said, by learning more you will decrease the inferential distance you will have to traverse in such arguments, and better deduce whether they are valid.
That being said, I still reject these sorts of existential risk arguments based mostly on intuition, plus I am unwilling to do things with high probabilities of failure, no matter how good the situation would be in the event of success.
ETA: To clarify, I think existential risk reduction is a worthwhile goal, but I am uncomfortable with arguments advocating specific ways to reduce risk that rely on very abstract or low-probability scenarios.
The main reason for this is described in your post above, taken to its logical extreme you end up doing seemingly crazy stuff like trying to stop people from creating baby universes rather than solving friendly AI.
There are many arguments in this thread that this extreme isn’t even correct given the questionable premises, have you read them? Regardless, though, it really is important to be psychologically realistic, even if you feel you “should” be out there debating with AI researchers or something. Leading a psychologically healthy life makes it a lot less likely you’ll have completely burnt yourself out 10 years down the line when things might be more important, and it also sends a good signal to other people that you can work towards bettering the world without being some seemingly religiously devout super nerd. One XiXiDu is good, two XiXiDus is a lot better, especially if they can cooperate, and especially if those two XiXiDus can convince more XiXiDus to be a little more reflective and a little less wasteful. Even if the singularity stuff ends up being total bullshit or if something with more “should”-ness shows up, folk like you can always pivot and make the world a better place using some other strategy. That’s the benefit of keeping a healthy mind.
[Edit] I share your discomfort but this is more a matter of the uncertainty intrinsic to the world than we live in than a matter of education/intelligence. At some point a leap of faith is required.
I was pretty happy before LW, until I learnt about utility maximization. It tells me that I ought to do what I don’t want to do on any other than some highly abstract intellectual level. I don’t even get the smallest bit of satisfaction out of it, just depression.
IMHO, utilitarianism is a major screw-up for a human being. It is an unnatural philosophy which lacks family values and seems to be used mostly by human beings for purposes of signalling and manipulation.
I was pretty happy before LW, until I learnt about utility maximization. It tells me that I ought to do what I don’t want to do on any other than some highly abstract intellectual level.
Two things seem off. The first is that expected utility maximization isn’t the same thing as utilitarianism. Utility maximization can be done even if your utility function doesn’t care at all about utilitarian arguments, or is unimpressed by arguments in favor of scope sensitivity. But even after making that substitution, why do you think Less Wrong advocates utilitarianism? Many prominent posters have spoken out against it both for technical reasons and ethical ones. And arguments for EU maximization, no matter how convincing they are, aren’t at all related to arguments for utilitarianism. I understand what you’re getting at—Less Wrong as a whole seems to think there might be vitally important things going on in the background and you’d be silly to not think about them—but no one here is going to nod their head disapprovingly or shove math in your face if you say “I’m not comfortable acting from a state of such uncertainty”.
And I link to this article again and again these days, but it’s really worth reading: http://lesswrong.com/lw/uv/ends_dont_justify_means_among_humans/ . This doesn’t apply so much to epistemic arguments about whether risks are high or low, but it applies oh-so-much to courses of action that stem from those epistemic arguments.
The first is that expected utility maximization isn’t the same thing as utilitarianism.
The problem is that if I adopt unbounded utility maximization, then I perceive it to converge with utilitarianism. Even completely selfish values seem to converge with utilitarian motives. Not only does every human, however selfish, care about other humans, but they are also instrumental to their own terminal values.
Solving friendly AI means to survive. As long as you don’t expect to be able to overpower all other agents, by creating your own FOOMing AI, the best move is to play the altruism card and argue in favor of making an AI friendly_human.
Another important aspect is that it might be rational to treat copies of you, or agents with similar utility-functions (or ultimate preferences), as yourself (or at least assign non-negligible weight to them). One argument in favor of this is that the goals of rational agents with the same preferences will ultimately converge and are therefore instrumental in realizing what you want.
But even if you only care little about anything but near-term goals revealed to you by naive introspection, taking into account infinite (or nearly infinite, e.g. 3^^^^3) scenarios can easily outweigh those goals.
All in all, if you adopt unbounded utility maximization and you are not completely alien, you might very well end up pursuing utilitarian motives.
A real world example is my vegetarianism. I assign some weight to sub-human suffering, enough to outweigh the joy of eating meat. Yet I am willing to consume medical comforts that are a result of animal experimentation. I would also eat meat if I would otherwise die. Yet, if the suffering was big enough I would die even for sub-human beings, e.g. 3^^^^3 pigs being eaten. As a result, if I take into account infinite scenarios, my terminal values converge with that of someone subscribed to utilitarianism.
The problem, my problem, is that if all beings would think like this and sacrifice their own life’s, no being would end up maximizing utility. This is contradictory. One might argue that it is incredible unlikely to be in the position to influence so many other beings, and therefore devote some resources to selfish near-term values. But charities like the SIAI claim that I am in the position to influence enough beings to outweigh any other goals. At the end of the day I am left with the decision to either abandon unbounded utility maximization or indulge myself into the craziness of infinite ethics.
At the end of the day I am left with the decision to either abandon unbounded utility maximization or indulge myself into the craziness of infinite ethics.
How about, for example, assigning .5 probability to a bounded utility function (U1), and .5 probability to an unbounded (or practically unbounded) utility function (U2)? You might object that taking the average of U1 and U2 still gives an unbounded utility function, but I think the right way to handle this kind of value uncertainty is by using a method like the one proposed by Bostrom and Ord, in which case you ought to end up spending roughly half of your time/resources on what U1 says you should do, and half on what U2 says you should do.
I haven’t studied all the discussions on the parliamentary model, but I’m finding it hard to understand what the implications are, and hard to judge how close to right it is. Maybe it would be enlightening if some of you who do understand the model took a shot at answering (or roughly approximating the answers to) some practice problems? I’m sure some of these are underspecified and anyone who wants to answer them should feel free to fill in details. Also, if it matters, feel free to answer as if I asked about mixed motivations rather than moral uncertainty:
I assign 50% probability to egoism and 50% to utilitarianism, and am going along splitting my resources about evenly between those two. Suddenly and completely unexpectedly, Omega shows up and cuts down my ability to affect my own happiness by a factor of one hundred trillion. Do I keep going along splitting my resources about evenly between egoism and utilitarianism?
I’m a Benthamite utilitarian but uncertain about the relative values of pleasure (measured in hedons, with a hedon calibrated as e.g. me eating a bowl of ice cream) and pain (measured in dolors, with a dolor calibrated as e.g. me slapping myself in the face). My probability distribution over the 10-log of the number of hedons that are equivalent to one dolor is normal with mean 2 and s.d. 2. Someone offers me the chance to undergo one dolor but get N hedons. For what N should I say yes?
I have a marshmallow in front of me. I’m 99% sure of a set of moral theories that all say I shouldn’t be eating it because of future negative consequences. However, I have this voice telling me that the only thing that matters in all the history of the universe is that I eat this exact marshmallow in the next exact minute and I assign 1% probability to it being right. What do I do?
I’m 80% sure that I should be utilitarian, 15% sure that I should be egoist, and 5% sure that all that matters is that egoism plays no part in my decision. I’m given a chance to save 100 lives at the price of my own. What do I do?
I’m 100% sure that the only thing that intrinsically matters is whether a light bulb is on or off, but I’m 60% sure that it should be on and 40% sure that it should be off. I’m given an infinite sequence of opportunities to flip the switch (and no opportunity to improve my estimates). What do I do?
There are 1000 people in the universe. I think my life is worth M of theirs, with the 10-log of M uniformly distributed from −3 to 3. I will be given the opportunity to either save my own life or 30 other people’s lives, but first I will be given the opportunity to either save 3 people’s lives or learn the exact value of M with certainty. What do I do?
Why spend only half on U1? Spend (1 - epsilon). And write a lottery ticket giving the U2-oriented decision maker the power with probability epsilon. Since epsilon infinity = infinity, you still get infinite expected* utility (according to U2). And you also get pretty close to the max possible according to U1.
Infinity has uses even beyond allocating hotel rooms. (HT to A. Hajek)
Of course, Hajek’s reasoning also makes it difficult to locate exactly what it is that U2 “says you should do”.
In general, it should be impossible to allocate 0 to U2 in this sense. What’s the probability that an angel comes down and magically forces you to do the U2 decision? Around epsilon, i’d say.
U2 then becomes totally meaningless, and we are back with a bounded utility function.
you ought to end up spending roughly half of your time/resources on what U1 says you should do, and half on what U2 says you should do
That can’t be right. What if U1 says you ought to buy an Xbox, then U2 says you ought to throw it away? Looks like a waste of resources. To avoid such wastes, your behavior must be Bayesian-rational. That means it must be governed by a utility function U3. What U3 is defined by the parliamentary model? You say it’s not averaging, but it has to be some function defined in terms of U1 and U2.
We’ve discussed a similar problem proposed by Stuart on the mailing list and I believe I gave a good argument (on Jan 21, 2011) that U3 must be some linear combination of U1 and U2 if you want to have nice things like Pareto-optimality. All bargaining should be collapsed into the initial moment, and output the coefficients of the linear combination which never change from that point on.
Right, clearly what I said can’t be true for arbitrary U1 and U2, since there are obvious counterexamples. And I think you’re right that theoretically, bargaining just determines the coefficients of the linear combination of the two utility functions. But it seems hard to apply that theory in practice, whereas if U1 and U2 are largely independent and sublinear in resources, splitting resources between them equally (perhaps with some additional Pareto improvements to take care of any noticeable waste from pursuing two completely separate plans) seems like a fair solution that can be applied in practice.
(ETA side question: does your argument still work absent logical omniscience, for example if one learns additional logical facts after the initial bargaining? It seems like one might not necessarily want to stick with the original coefficients if they were negotiated based on an incomplete understanding of what outcomes are feasible, for example.)
I can’t tell what that combination is, which is odd. The non-smoothness is problematic. You run right up against the constraints—I don’t remember how to deal with this. Can you?
If you have N units of resources which can be devoted to either task A or task B, the ratios of resource used will be the ratio of votes.
I think it depends on what kind of contract you sign. So if I sign a contract that says “we decide according to this utility function” you get something different then a contract that says “We vote yes in these circumstances and no in those circumstances”. The second contract, you can renegotiate, and that can change the utility function.
ETA:
In the case where utility is linear in the set of decisions that go to each side, for any Pareto-optimal allocation that both parties prefer to the starting (random) alllocation, you can construct a set of prices that is consistent with that allocation. So you’re reduced to bargaining, which I guess means Nash arbitration.
I don’t know how to make decisions under logical uncertainty in general. But in our example I suppose you could try to phrase your uncertainty about logical facts you might learn in the future in Bayesian terms, and then factor it into the initial calculation.
The first is that expected utility maximization isn’t the same thing as utilitarianism.
The problem is that if I adopt unbounded utility maximization, then I perceive it to converge with utilitarianism. Even completely selfish values seem to converge with utilitarian motives. Not only does every human, however selfish, care about other humans, but they are also instrumental to their own terminal values.
These are surely really, really different things. Utilitarianism says to count people more-or-less equally. However, the sort of utility maximization that actually goes on in people’s heads typically results in people valuing their own existence vastly above that of everyone else. That is because they were built that way by evolution—which naturally favours egoism. So, their utility function says: “Me, me, me! I, me, mine!” This is not remotely like utilitarianism—which explains why utilitarians have such a hard time acting on their beliefs—they are wired up by nature to do something totally different.
Also, you probably should not say “instrumental to their own terminal values”. “Instrumental” in this context usually refers to “instrumental values”. Using it to mean something else is likely to mangle the reader’s mind.
At the end of the day I am left with the decision to either abandon unbounded utility maximization or indulge myself into the craziness of infinite ethics.
So, I think about things like infinite ethics all the time, and it doesn’t seem to disturb me to the extent it does you. You might say, “My brain is set up such that I automatically feel a lot of tension/drama when I feel like I might be ignoring incredibly morally important things.” But it is unclear that this need be the case. I can’t imagine that the resulting strain is useful in the long run. Have you tried jumping up a meta-level, tried to understand and resolve whatever’s causing the strain? I try to think of it as moving in harmony with the Dao.
We do in fact want to save worlds we can’t begin to fathom from dangers we can’t begin to fathom even if it makes us depressed or dead… but if you don’t get any satisfaction from saving the world, you might have a problem with selfishness.
...but if you don’t get any satisfaction from saving the world, you might have a problem with selfishness.
That’s not what I meant. What I meant is the general problem you run into when you take this stuff to its extreme. You end up saving hypothetical beings with a very low probability. That means that you might very well save no being at all, if your model was bogus. I am aware that the number of beings saved often outweighs the low probability...but I am not particular confident in this line of reasoning, i.e. in the meta-level of thinking about how to maximize good deeds. That leads to all kind of crazy seeming stuff.
If it does, something almost definitely went wrong. Biases crept in somewhere between the risk assessment, the outside view correction process, the policy-proposing process, the policy-analyzing process, the policy outside view correction process, the ethical injunction check, and the “(anonymously) ask a few smart people whether some part of this is crazy” step. I’m not just adding unnatural steps; each of those should be separate, and each of those is a place where error can throw everything off. Overconfidence plus conjunction fallacy equals crazy seeming stuff. And this coming from the guy who is all about taking ideas seriously.
I don’t feel there is a need for that. You just present these things as tools, not fundamental ideas, also discussing why they are not fundamental and why figuring out fundamental ideas is important. The relevant lesson is along the lines of Fake Utility Functions (the post has “utility function” in it, but it doesn’t seem to need to), applied more broadly to epistemology.
You just present these things as tools, not fundamental ideas, also discussing why they are not fundamental and why figuring out fundamental ideas is important.
Thinking of Bayesianism as fundamental is what made some people (e.g., at least Eliezer and me) think that fundamental ideas exist and are important. (Does that mean we ought to rethink whether fundamental ideas exist and are important?) From Eliezer’s My Bayesian Enlightenment:
The first time I heard of “Bayesianism”, I marked it off as obvious; I didn’t go much further in than Bayes’s rule itself. At that time I still thought of probability theory as a tool rather than a law. I didn’t think there were mathematical laws of intelligence (my best and worst mistake). Like nearly all AGI wannabes, Eliezer2001 thought in terms of techniques, methods, algorithms, building up a toolbox full of cool things he could do; he searched for tools, not understanding. Bayes’s Rule was a really neat tool, applicable in a surprising number of cases.
(Besides, even if your suggestion is feasible, somebody would have to rewrite a great deal of Eliezer’s material to not present Bayesianism as fundamental.)
The ideas of Bayesian credence levels and maximum entropy priors are important epistemic tools that in particular allow you to understand that those kludgy AI tools won’t get you what you want.
(Besides, even if your suggestion is feasible, somebody would have to rewrite a great deal of Eliezer’s material to not present Bayesianism as fundamental.)
(It doesn’t matter for the normative judgment, but I guess that’s why you wrote this in parentheses.)
I don’t think Eliezer misused the idea in the sequences, as Bayesian way of thinking is a very important tool that must be mastered to understand many important arguments. And I guess at this point we are arguing about the sense of “fundamental”.
Agreed, but what I’m mostly griping about is when people who know that utility functions are a really inaccurate model still go ahead and use it, even if prefaced by some number of standard caveats. “Goal system”, for example, conveys a similar abstract idea without all of the questionable and misleading technical baggage (let alone associations with “utilitarianism”), and is more amenable to case-specific caveats. I don’t think we should downvote people for talking about utility functions, especially if they’re newcomers, but there’s a point at which we have to adopt generally higher standards for which concepts we give low K complexity in our language.
I have a vested interested in this. All of the most interesting meta-ethics and related decision theory I’ve seen thus far has come from people associated with SingInst or Less Wrong. If we are to continue to be a gathering place for that kind of mind we can’t let our standards degenerate, and ideally we should be aiming for improvement. From far away it would be way easy to dismiss Less Wrong as full of naive nerds completely ignorant of both philosophy and psychology. From up close it would be easy to dismiss Less Wrong as overly confident in a suspiciously homogeneous set of philosophically questionable meta-ethical beliefs, e.g. some form of utilitarianism. The effects of such appearances are hard to calculate and I think larger than most might intuit. (The extent to which well-meaning folk of an ideology very influenced by Kurzweil have poisoned the well for epistemic-hygienic or technical discussion of technological singularity scenarios, for instance, seems both very large and very saddening.)
From up close it would be easy to dismiss Less Wrong as overly confident in a suspiciously homogeneous set of philosophically questionable meta-ethical beliefs, e.g. utilitarianism.
What is giving this appearance? We have plenty of vocal commenters who are against utilitarianism, top-level posts pointing out problems in utilitarianism, and very few people actually defending utilitarianism. I really don’t get it. (BTW, utilitarianism is usually considered normative ethics, not metaethics.)
Also, utility function != utilitarianism. The fact that some people get confused about this is not a particularly good (additional) reason to stop talking about utility functions.
Here is someone just in this thread who apparently confuses EU-maxing with utilitarianism and apparently thinks that Less Wrong generally advocates utilitarianism. I’ll ask XiXiDu what gave him these impressions, that might tell us something.
ETA: The following comment is outdated. I had a gchat conversation with Wei Dai in which he kindly pointed out some ways in which my intended message could easily and justifiably have interpreted as a much stronger claim. I’ll add a note to my top level comment warning about this.
Also, utility function != utilitarianism. The fact that some people get confused about this is not a particularly good (additional) reason to stop talking about utility functions.
I never proposed that people stop talking about utility functions, and twice now I’ve described the phenomenon that I’m actually complaining about. Are you trying to address some deeper point you think is implicit in my argument, are you predicting how other people will interpret my argument and arguing against that interpreted version, or what? I may be wrong, but I think it is vitally important for epistemic hygiene that we at least listen to and ideally respond to what others are actually saying. You’re an excellent thinker and seemingly less prone to social biases than most so I am confused by your responses. Am I being dense somehow?
(ETA: The following hypothesis is obviously absurd. Blame it on rationalization. It’s very rare I get to catch myself so explicitly in the act! w00t!) Anyway, the people I have in mind don’t get confused about the difference between reasoning about/with utility functions and being utilitarian, they just take the former as strong evidence as of the latter. This doesn’t happen when “utility function” is used technically or in a sand-boxed way, only when it is used in the specific way that I was objecting to. Notice how I said we should be careful about which concepts we use, not which words.
I don’t really get it either. It seems that standard Less Wrong moral philosophy can be seen at some level of abstraction as a divergence from utilitarianism, e.g. because of apparently widespread consequentialism and focus on decision theory. But yeah, you’d think the many disavowments of utilitarianism would have done more to dispel the notion. Does your impression agree with mine though that it seems that many people think Less Wrong is largely utilitarian?
(BTW, utilitarianism is usually considered normative ethics, not metaethics.)
I desperately want a word that covers the space I want to cover that doesn’t pattern match to incorrect/fuzzy thing. (E.g. I think it is important to remember that one’s standard moral beliefs can have an interesting implicit structure at the ethical/metaethical levels, vice versa, et cetera.) Sometimes I use “shouldness” or “morality” but those are either misleading or awkward depending on context. Are there obvious alternatives I’m missing? I used “moral philosophy” above but I’m pretty sure that’s also straight-up incorrect. Epistemology of morality is clunky and probably means something else.
I don’t see how you can get people to stop talking about human utility functions unless you close LW off from newcomers.
Why would you want to stop people talking about human utility functions?!? People should not build economic models of humans? How are such things supposedly misleading? You are concerned people will drag in too much from Von Neumann and Morgenstern? What gives?
By contrast, the idea that humans don’t have utility functions seems to be mysterian nonsense. What sense can be made out of that idea?
As I see it, humans have revealed behavioral tendencies and reflected preferences. I share your reservations about “revealed preferences”, which if they differ from both would have to mean something in between. Maybe revealed preferences would be what’s left after reflection to fix means-ends mistakes but not other reflection, if that makes sense. But when is that concept useful? If you’re going to reflect on means-ends, why not reflect all the way?
Also note that the preferences someone reveals through programming them into a transhuman AI may be vastly different from the preferences someone reveals through other sorts of behavior. My impression is that many people who talk about “revealed preferences” probably wouldn’t count the former as authentic revealed preferences, so they’re privileging behavior that isn’t too verbally mediated, or something. I wonder if this attributing revealed preference to a person rather than a person-situation pair should set off fundamental attribution error alarms.
If we have nothing to go by except behavior, it seems like it’s underdetermined whether we should say it’s preferences or beliefs (aliefs) or akrasia that’s being revealed, given that these factors determine behavior jointly and that we’re defining them by their effects. With reflected preferences it seems like you can at least ask the person which one of these factors they identify as having caused their behavior.
Why are people on Less Wrong still talking about ‘their’ ‘values’ using deviations from a model that assumes they have a ‘utility function’? It’s not enough to explicitly believe and disclaim that this is obviously an incorrect model, at some point you have to actually stop using the model and adopt something else. People are godshatter, they are incoherent, they are inconsistent, they are an abstraction, they are confused about morality, their revealed preferences aren’t their preferences, their revealed preferences aren’t even their revealed preferences, their verbally expressed preferences aren’t even preferences, the beliefs of parts of them about the preferences of other parts of them aren’t their preferences, the beliefs of parts of them aren’t even beliefs, preferences aren’t morality, predisposition isn’t justification, et cetera...
We might make something someday that isn’t godshatter, and we need to practice.
I agree that reforming humans to be rational is hopeless, but it is nevertheless useful to imagine how a rational being would deal with things.
But VNM utility is just one particularly unintuitive property of rational agents. (For instance, I would never ever use a utility function to represent the values of an AGI.) Surely we can talk about rational agents in other ways that are not so confusing?
Also, I don’t think VNM utility takes into account things like bounded computational resources, although I could be wrong. Either way, just because something is mathematically proven to exist doesn’t mean that we should have to use it.
It seems though that the reward function might be extremely complicated in general (in fact I suspect that this paper can be used to show that the reward function can be potentially uncomputable).
I agree with jsteinhardt, thanks for the reference.
I agree that the reward functions will vary in complexity. If you do the usual thing in Solomonoff induction, where the plausibility of a reward function decreases exponentially with its size, so far as I can tell you can infer reward fuctions from behavior, if you can infer behavior.
We need to infer a utility function for somebody if we’re going to help them get what they want, since a utility function is the only reasonable description I know of what an agent wants.
It was my impression that it was LW orthodoxy that at “reflective equilibrium”, the values and preferences of rational humans can be represented by a utility function. That is:
if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted
… if we or our AI surrogate ever reach that point, then humans have a utility function that captures what we want morally and hedonistically. Or so I understand it.
Yes, our current god-shatter-derived inconsistent values can not be described by a utility function, even as an abstraction. But it seems to me that most of the time what we are actually talking about is what our values ought to be rather than what they are. So, I don’t think that a utility function is a ridiculous abstraction—particularly for folk who strive to be rational.
Yes, our current god-shatter-derived inconsistent values can not be described by a utility function, even as an abstraction.
Actually, yes they can. Any computable agent’s values can be represented by a utility function. That’s one of the good things about modelling using utility functions—they can represent any agent. For details, see here:
Any agent can be expressed as an O-maximizer (as we show in Section 3.1)
Why are people on Less Wrong still talking about ‘their’ ‘values’ using deviations from a model that assumes they have a ‘utility function’? [...] Can we please avoid using the concept of a human “utility function” even as an abstraction?
Nope. Humans do have utility functions—in this sense:
It would be convenient if we could show that all O-maximizers have some
characteristic behavior pattern, as we do with reward maximizers in Appendix
B. We cannot do this, though, because the set of O-maximizers coincides with
the set of all agents; any agents can be written in O-maximizer form.
To prove this, consider [...]
Any computable agent has a utility function. That’s the beauty of using a general theory.
Nope. Humans do have utility functions—in this sense:
A trivial sense, that merely labels what an agent does with 1 and what it doesn’t with 0: the Texas Sharpshooter Utility Function. A “utility function” that can only be calculated—even by the agent itself—in hindsight is not a utility function. The agent is not using it to make choices and no observer can use it to make predictions about the agent.
A “utility function” that can only be calculated—even by the agent itself—in hindsight is not a utility function. The agent is not using it to make choices and no observer can use it to make predictions about the agent.
Er, the idea is that you can make a utility-maximising model of the agent—using the specified utility function—that does the same things the agent does if you put it in the same environment.
Can people please stop dissing the concept of a human utility function. Correcting these people is getting tedious—and I don’t want to be boring.
Er, the idea is that you can make a utility-maximising model of the agent—using the specified utility function—that does the same things the agent does if you put it in the same environment.
Doesn’t work. The Texas Sharpshooter utility function described by Dewar cannot be used to make a utility-maximising model of the agent, except by putting a copy of the actual agent into the box, seeing what it does, declaring that to have utility 1, and doing it. The step of declaring it to have utility 1 plays no role in deciding the actions. It is a uselessly spinning cog doing no more work than a suggestive name on a Lisp symbol.
Can people please stop dissing the concept of a human utility function. Correcting these people is getting tedious—and I don’t want to be boring.
I was thinking a similar thought about you. You’re the only person here that I’ve seen taking these trivial utility functions seriously.
Er, the idea is that you can make a utility-maximising model of the agent—using the specified utility function—that does the same things the agent does if you put it in the same environment.
Doesn’t work. The Texas Sharpshooter utility function described by Dewar cannot be used to make a utility-maximising model of the agent, except by putting a copy of the actual agent into the box, seeing what it does, declaring that to have utility 1, and doing it. The step of declaring it to have utility 1 plays no role in deciding the actions. It is a uselessly spinning cog doing no more work than a suggestive name on a Lisp symbol.
The idea here is that—if the agent is computable—then it can be simulated by any other computable system. So, if the map between its inputs and state, and its motor output is computable then we can make another computable system which produces the same map—since all universal computing systems can simulate each other by virtue of being Turing complete (and systems made of e.g. partial recursive functions can simulate each other too—if they are given enough memory to do so).
I mentioned computability at the top, by saying: “any computable agent has a utility function”.
The idea here is that—if the agent is computable—then it can be simulated by any other computable system. So, if the map between its inputs and state, and its motor output is computable then we can make another computable system which produces the same map—since all universal computing systems can simulate each other by virtue of being Turing complete (and systems made of e.g. partial recursive functions can simulate each other too—if they are given enough memory to do so).
I don’t see how this bears on the possibility of modelling every agent by a utility-maximising agent. Dewar’s construction doesn’t work. Its simulation of an agent by a utility-maximising agent just uses the agent to simulate itself and attaches the label “utility=1” to its actions.
Dewey says pretty plainly: “any agents can be written in O-maximizer form”.
I know that he says that. I am saying, I thought pretty plainly, that I disagree with him.
He makes an O-maximiser from an agent, A. Once you have the corresponding O-maximiser, the agent A could be discarded.
He only does that in the earlier paper. His construction is as I described it: define O as doing whatever A does and label the result with utility 1. A is a part of O and cannot be discarded. He even calls this construction trivial himself, but underrates its triviality.
I don’t really understand which problem you are raising. If the O eventually contains a simulated copy of A—so what? O is still a utililty-maximiser that behaves the same way that A does if placed in the same environment.
The idea of a utility maximiser as used here is that it assigns utilities to all its possible actions and then chooses the action with the highest utility. O does that—so it qualifies as a utililty-maximiser.
The idea of a utility maximiser as used here is that it assigns utilities to all its possible actions and then chooses the action with the highest utility. O does that—so it qualifies as a utililty-maximiser.
O doesn’t assign utilities to its actions and then choose the best. It chooses its action (by simulating A), labels it with utility 1, and chooses to perform the action it just chose. The last two steps are irrelevant.
O doesn’t assign utilities to its actions and then choose the best. It chooses its action (by simulating A), labels it with utility 1, and chooses to perform the action it just chose. The last two steps are irrelevant.
“Irrelevant”? If it didin’t perform those steps, it wouldn’t be a utility maximiser, and then the proof that you can build a utility maximiser which behaves like any computable agent wouldn’t go through. Those steps are an important part of the reason for exhibiting this construction in the first place.
I think that everyone understands the point you’re trying to make—you can usefully model people as having a utility function in a wide variety of cases—but very often people use such models unskillfully, and it causes people like me to facepalm. If you want to model a lot of humans, for instance, it’s simple and decently accurate to model them as having utility functions. Economics, say. And if you have something like AIXI, or as Dawkins might argue a gene, then a utility function isn’t even a model, it’s right there in front of you.
I hypothesize that the real trouble starts when a person confuses the two; he sees or imagines a Far model of humans with utility functions, zooms in on an individual human or zooms in on himself, and thinks he can see the real utility function sitting right there in front of him, like he could with AIXI. Yeah, he knows in the abstract that he doesn’t have direct access to it, but it feels Near. This can lead to a lot of confusion, and it leads people like me to think folk shouldn’t talk about a person’s “utility function” except in cases where it obviously applies.
Even where you can say “Person A has a utility function that assigns 4 utility to getting cheesecake and 2 utility to getting paperlips”, why not say “Agent A”? But that’s not what I facepalm at. I only facepalm when people say they got their “utility function” from natural selection (i.e. ignoring memes), or say they wish they could modify their utility function, et cetera. In many cases it works as an abstraction, but if you’re not at all thinking about EU, why not talk directly about your preferences/values? It’s simpler and less misleading.
This seems like a bit of a different issue—and one that I am not so interested in.
A couple of comments about your examples, though:
I only facepalm when people say they got their “utility function” from natural selection (i.e. ignoring memes), or say they wish they could modify their utility function, et cetera.
For someone like me it is pretty accurate to say that I got my utility function from natural selection acting on DNA genes. Memes influence me, but I try not to let them influnce my goals. I regard them as symbiotes: mutualists and pathogens. In principle they could do deals with me that might make me change my goals—but currently I have a powerful bargaining position, their bargaining position is typically weak—and so I just get my way. They don’t get to affect my goals. Those that try get rejected by my memetic immune system. I do not want to become the victim of a memetic hijacking.
As for the implied idea that natural selection does not apply to memes, I’ll try to bite my tongue there.
why not talk directly about your preferences/values? It’s simpler and less misleading.
That seems closely equivalent to me. The cases where people talk about utility functions are mostly those where you want to compare with machines, or conjour up the idea of an expected utility maximiser for some reason. Sometimes even having “utility” in the context is enough for the conversation to wander on to utility functions.
My council would be something like: “Don’t like it? Get used to it!” There is not, in fact, anything wrong with it.
As for the implied idea that natural selection does not apply to memes, I’ll try to bite my tongue there.
That totally wasn’t what I meant to imply. I am definitely a universal Darwinist. (You can view pretty much any optimization process as “evolution”, though, so in some cases it’s questionably useful. Bayesian updating is just like population genetics. But with memes it’s obviously a good description.)
For someone like me it is pretty accurate to say that I got my utility function from natural selection acting on DNA genes.
Yes, but I think you’re rather unusual in this regard; most people aren’t so wary of memes. Might I ask why you prefer genes to memes? This seems odd to me. Largely because humans evolved for memes and with memes. Archetypes, for example. But also because the better memes seem to have done a lot of good in the world. (My genetically evolved cognitive algorithms—that is, the algorithms in my brain that I think aren’t the result of culture, but instead are universal machinery—stare in appreciation at the beauty of cathedrals, and are grateful that economies make my life easier.)
As for the implied idea that natural selection does not apply to memes, I’ll try to bite my tongue there.
That totally wasn’t what I meant to imply.
’s why I tried to bite my tongue—but it was difficult to completely let it go by...
For someone like me it is pretty accurate to say that I got my utility function from natural selection acting on DNA genes.
Yes, but I think you’re rather unusual in this regard; most people aren’t so wary of memes. Might I ask why you prefer genes to memes? This seems odd to me.
Well, I love memes, but DNA-genes built 99% of my ancestors unassisted, and are mostly responsible for building me. They apparently equipped me with a memetic immune system, for weeding out undesirable memes, to allow me to defend myself in those cases where there is a conflict of interests.
Why should I side with the memes? They aren’t even related to me. The best of them are beneficial human symbionts—rather like lettuces and strawberries. I care for them some—but don’t exactly embrace their optimisation targets as my own.
I don’t dispute memes have done a lot of good things in the world. So has Mother Teresa—but that doesn’t mean I have to adopt her goals as my own either.
I know what I want based on naive introspection. If you want to have preferences other than those based on naive introspection, then one of your preferences, based on naive introspection, is not to have preferences that are based on naive introspection. I am not sure how you think you could ever get around intuition, can you please elaborate?
Naive introspection is an epistemic process; it’s one kind of algorithm you can run to figure out aspects of the world, in this case your mind. Because it’s an epistemic process we know that there are many, many ways it can be suboptimal. (Cognitive biases come to mind, of course; Robin Hanson writes a lot about how naive introspection and actual reasons are very divergent. But sheer boundedness is also a consideration; we’re just not very good Bayesians.) Thus, when you say “one of your preferences, based on naive introspection, is not to have preferences that are based on naive introspection,” I think:
If my values are what I think they are, I desire to believe that my values are what I think they are; If my values aren’t what I think they are, I desire to believe that my values aren’t what I think they are; Let me not become attached to values that may not be.
Agree completely. (Even though I am guilty of using the word myself below.) But most of this post seems to be based on linearity of preference, which imho can usually only be justified by muddling around with utilities. So maybe that is the place to start?
EDIT: To clarify, I mean that maybe the reason to reject Person 1′s argument is because it implicitly appeals to notions of utility when claiming you should maximize expected DALYs.
Not referring to your post, no, just some aspects of some of the comments on it and the memetic ecology that enables those aspects. I’ll add a meta tag to my comment to make this clearer.
Why are people on Less Wrong still talking about ‘their’ ‘values’ using deviations from a model that assumes they have a ‘utility function’?
Because rational agents care about whatever the hell they want to care about. I, personally, choose to care about my abstract ‘utility function’ with the clear implication that said utility function is something that must be messily constructed from godshatter preferences. And that’s ok because it is what I want to want.
Can we please avoid using the concept of a human “utility function” even as an abstraction
No. It is a useful abstraction. Not using utility function measures does not appear to improve abstract decision making processes. I’m going to stick with it.
Eliezer’s original quote was better. Wasn’t it about superintelligences? Anyway you are not a superintelligence or a rational agent and therefore have not yet earned the right to want to want whatever you think you want to want. Then again I don’t have the right to deny rights so whatever.
Eliezer’s original quote was better. Wasn’t it about superintelligences?
I wasn’t quoting Eliezer, I made (and stand by) a plain English claim. It does happen to be a similar in form to a recent instance of Eliezer summarily rejecting PhilGoetz declaration that rationalists don’t care about the future. That quote from Eliezer was about “expected-utility-maximising agents” which would make the quote rather inappropriate in the context.
I will actually strengthen my declaration to:
Because agents can care about whatever the hell they want to care about. (This too should be uncontroversial.)
Anyway you are not a superintelligence or a rational agent and therefore have not yet earned the right to want to want whatever you think you want to want.
An agent does not determine its preferences by mere vocalisation and nor does its belief about its preference intrinsically make them so. Nevertheless I do care about my utility function (with the vaguely specified caveats). If you could suggest a formalization sufficiently useful for decision making that I could care about it even more than my utility function then I would do so. But you cannot.
Then again I don’t have the right to deny rights so whatever.
No, you don’t. The only way you could apply limits on what I want is via physically altering my molecular makeup. As well as being rather difficult for you to do on any significant scale I could credibly claim that the new physical configuration you constructed from my atoms is other than ‘me’. You can’t get much more of a fundamental destruction of identity than by changing what an agent wants.
I don’t object to you declaring that you don’t have or don’t want to have a utility function. That’s your problem not mine. But I will certainly object to any interventions made that deny that others may have them.
ETA: This is a meta comment about some aspects of some comments on this post and what I perceive to be problems with the sort of communication/thinking that leads to the continued existence of those aspects. This comment is not meant to be taken as a critique of the original post.
ETA2: This comment lacks enough concreteness to act as a serious consideration in favor of one policy over another. Please disregard it as a suggestion for how LW should normatively respond to something. Instead one might consider if one might personally benefit from enacting a policy I might be suggesting, on an individual basis.
Why are people on Less Wrong still talking about ‘their’ ‘values’ using deviations from a model that assumes they have a ‘utility function’? It’s not enough to explicitly believe and disclaim that this is obviously an incorrect model, at some point you have to actually stop using the model and adopt something else. People are godshatter, they are incoherent, they are inconsistent, they are an abstraction, they are confused about morality, their revealed preferences aren’t their preferences, their revealed preferences aren’t even their revealed preferences, their verbally expressed preferences aren’t even preferences, the beliefs of parts of them about the preferences of other parts of them aren’t their preferences, the beliefs of parts of them aren’t even beliefs, preferences aren’t morality, predisposition isn’t justification, et cetera…
Can we please avoid using the concept of a human “utility function” even as an abstraction, unless it obviously makes sense to do so? If you’re specific enough and careful enough it can work out okay (e.g. see JenniferRM’s comment) but generally it is just a bad idea. Am I wrong to think this is both obviously and non-obviously misleading in a multitude of ways?
Don’t you think people need to go through an “ah ha, there is such a thing as rationality, and it involves Bayesian updating and expected utility maximization” phase before moving on to “whoops, actually we don’t really know what rationality is and humans don’t seem to have utility functions”? I don’t see how you can get people to stop talking about human utility functions unless you close LW off from newcomers.
I was pretty happy before LW, until I learnt about utility maximization. It tells me that I ought to do what I don’t want to do on any other than some highly abstract intellectual level. I don’t even get the smallest bit of satisfaction out of it, just depression.
Saving galactic civilizations from superhuman monsters burning the cosmic commons, walking into death camps as to reduce the likelihood of being blackmailed, discounting people by the length of their address in the multiverse...taking all that seriously and keeping one’s sanity, that’s difficult for some people.
What LW means by ‘rationality’ is to win in a hard to grasp sense that is often completely detached from the happiness and desires of the individual.
If this is really having that effect on you, why not just focus on things other than abstract large-scale ethical dilemmas, e.g. education, career, relationships? Progress on those fronts is likely to make you happier, and if you want to come back to mind-bending ethical conundrums you’ll then be able to do so in a more productive and pleasant way. Trying to do something you’re depressed and conflicted about is likely to be ineffective or backfire.
Yeah, I have found that when my mind breaks, I have to relax while it heals before I can engage it in the same sort of vigorous exercise again.
It’s important to remember that that’s what is going on. When you become overloaded and concentrate on other things, you are not neglecting your duty. Your mind needs time to heal and become stronger by processing the new information you’ve given it.
Not necessarily, sometimes people are doing exactly that, depending on what you mean by “overloaded”.
Hmm… I think I’ve slipped into “defending a thesis” mode here. The truth is that the comment you replied to was much too broad, and incorrect as stated, as you correctly pointed out. Thanks for catching my error!
You are right, it depends on the specifics. And if you focus on other things with no plan to ever return to the topic that troubled you, that’s different. But if you’ve learned things that make demands on your mind beyond what it can meet, then failing to do what is in fact impossible for you is not negligence.
Gosh, recurring to jsteinhart’s comment everything should add up to normality . If you feel that you’re being led by abstract reasoning in directions that feel consistently feel wrong then there’s probably something wrong with the reasoning. My own interest in existential risk reduction is that when I experience a sublime moment I want people to be able to have more of them for a long time. If all that there was was a counterintuitive abstract argument I would think about other things.
Yup, my confidence in the reasoning here on LW and my own ability to judge it is very low. The main reason for this is described in your post above, taken to its logical extreme you end up doing seemingly crazy stuff like trying to stop people from creating baby universes rather than solving friendly AI.
I don’t know how to deal with this. Where do I draw the line? What are the upper and lower bounds? Are risks from AI above or below the line of uncertainty that I better ignore, given my own uncertainty and the uncertainty in the meta-level reasoning involved?
I am too uneducated and probably not smart enough to figure this out, yet I face the problems that people who are much more educated and intelligent than me devised.
If a line of reasoning is leading you to do something crazy, then that line of reasoning is probably incorrect. I think that is where you should draw the line. If the reasoning is actually correct, then by learning more your intuitions will automatically fall in line with the reasoning and it will not seem crazy anymore.
In this case, I think your intuition correctly diagnoses the conclusion as crazy. Whether you are well-educated or not, the fact that you can tell the difference speaks well of you, although I think you are causing yourself way too much anxiety by worrying about whether you should accept the conclusion after all. Like I said, by learning more you will decrease the inferential distance you will have to traverse in such arguments, and better deduce whether they are valid.
That being said, I still reject these sorts of existential risk arguments based mostly on intuition, plus I am unwilling to do things with high probabilities of failure, no matter how good the situation would be in the event of success.
ETA: To clarify, I think existential risk reduction is a worthwhile goal, but I am uncomfortable with arguments advocating specific ways to reduce risk that rely on very abstract or low-probability scenarios.
There are many arguments in this thread that this extreme isn’t even correct given the questionable premises, have you read them? Regardless, though, it really is important to be psychologically realistic, even if you feel you “should” be out there debating with AI researchers or something. Leading a psychologically healthy life makes it a lot less likely you’ll have completely burnt yourself out 10 years down the line when things might be more important, and it also sends a good signal to other people that you can work towards bettering the world without being some seemingly religiously devout super nerd. One XiXiDu is good, two XiXiDus is a lot better, especially if they can cooperate, and especially if those two XiXiDus can convince more XiXiDus to be a little more reflective and a little less wasteful. Even if the singularity stuff ends up being total bullshit or if something with more “should”-ness shows up, folk like you can always pivot and make the world a better place using some other strategy. That’s the benefit of keeping a healthy mind.
[Edit] I share your discomfort but this is more a matter of the uncertainty intrinsic to the world than we live in than a matter of education/intelligence. At some point a leap of faith is required.
That’s not utility maximisation, that’s utilitarianism. A separate idea, though confusingly named.
IMHO, utilitarianism is a major screw-up for a human being. It is an unnatural philosophy which lacks family values and seems to be used mostly by human beings for purposes of signalling and manipulation.
Two things seem off. The first is that expected utility maximization isn’t the same thing as utilitarianism. Utility maximization can be done even if your utility function doesn’t care at all about utilitarian arguments, or is unimpressed by arguments in favor of scope sensitivity. But even after making that substitution, why do you think Less Wrong advocates utilitarianism? Many prominent posters have spoken out against it both for technical reasons and ethical ones. And arguments for EU maximization, no matter how convincing they are, aren’t at all related to arguments for utilitarianism. I understand what you’re getting at—Less Wrong as a whole seems to think there might be vitally important things going on in the background and you’d be silly to not think about them—but no one here is going to nod their head disapprovingly or shove math in your face if you say “I’m not comfortable acting from a state of such uncertainty”.
And I link to this article again and again these days, but it’s really worth reading: http://lesswrong.com/lw/uv/ends_dont_justify_means_among_humans/ . This doesn’t apply so much to epistemic arguments about whether risks are high or low, but it applies oh-so-much to courses of action that stem from those epistemic arguments.
The problem is that if I adopt unbounded utility maximization, then I perceive it to converge with utilitarianism. Even completely selfish values seem to converge with utilitarian motives. Not only does every human, however selfish, care about other humans, but they are also instrumental to their own terminal values.
Solving friendly AI means to survive. As long as you don’t expect to be able to overpower all other agents, by creating your own FOOMing AI, the best move is to play the altruism card and argue in favor of making an AI friendly_human.
Another important aspect is that it might be rational to treat copies of you, or agents with similar utility-functions (or ultimate preferences), as yourself (or at least assign non-negligible weight to them). One argument in favor of this is that the goals of rational agents with the same preferences will ultimately converge and are therefore instrumental in realizing what you want.
But even if you only care little about anything but near-term goals revealed to you by naive introspection, taking into account infinite (or nearly infinite, e.g. 3^^^^3) scenarios can easily outweigh those goals.
All in all, if you adopt unbounded utility maximization and you are not completely alien, you might very well end up pursuing utilitarian motives.
A real world example is my vegetarianism. I assign some weight to sub-human suffering, enough to outweigh the joy of eating meat. Yet I am willing to consume medical comforts that are a result of animal experimentation. I would also eat meat if I would otherwise die. Yet, if the suffering was big enough I would die even for sub-human beings, e.g. 3^^^^3 pigs being eaten. As a result, if I take into account infinite scenarios, my terminal values converge with that of someone subscribed to utilitarianism.
The problem, my problem, is that if all beings would think like this and sacrifice their own life’s, no being would end up maximizing utility. This is contradictory. One might argue that it is incredible unlikely to be in the position to influence so many other beings, and therefore devote some resources to selfish near-term values. But charities like the SIAI claim that I am in the position to influence enough beings to outweigh any other goals. At the end of the day I am left with the decision to either abandon unbounded utility maximization or indulge myself into the craziness of infinite ethics.
How about, for example, assigning .5 probability to a bounded utility function (U1), and .5 probability to an unbounded (or practically unbounded) utility function (U2)? You might object that taking the average of U1 and U2 still gives an unbounded utility function, but I think the right way to handle this kind of value uncertainty is by using a method like the one proposed by Bostrom and Ord, in which case you ought to end up spending roughly half of your time/resources on what U1 says you should do, and half on what U2 says you should do.
I haven’t studied all the discussions on the parliamentary model, but I’m finding it hard to understand what the implications are, and hard to judge how close to right it is. Maybe it would be enlightening if some of you who do understand the model took a shot at answering (or roughly approximating the answers to) some practice problems? I’m sure some of these are underspecified and anyone who wants to answer them should feel free to fill in details. Also, if it matters, feel free to answer as if I asked about mixed motivations rather than moral uncertainty:
I assign 50% probability to egoism and 50% to utilitarianism, and am going along splitting my resources about evenly between those two. Suddenly and completely unexpectedly, Omega shows up and cuts down my ability to affect my own happiness by a factor of one hundred trillion. Do I keep going along splitting my resources about evenly between egoism and utilitarianism?
I’m a Benthamite utilitarian but uncertain about the relative values of pleasure (measured in hedons, with a hedon calibrated as e.g. me eating a bowl of ice cream) and pain (measured in dolors, with a dolor calibrated as e.g. me slapping myself in the face). My probability distribution over the 10-log of the number of hedons that are equivalent to one dolor is normal with mean 2 and s.d. 2. Someone offers me the chance to undergo one dolor but get N hedons. For what N should I say yes?
I have a marshmallow in front of me. I’m 99% sure of a set of moral theories that all say I shouldn’t be eating it because of future negative consequences. However, I have this voice telling me that the only thing that matters in all the history of the universe is that I eat this exact marshmallow in the next exact minute and I assign 1% probability to it being right. What do I do?
I’m 80% sure that I should be utilitarian, 15% sure that I should be egoist, and 5% sure that all that matters is that egoism plays no part in my decision. I’m given a chance to save 100 lives at the price of my own. What do I do?
I’m 100% sure that the only thing that intrinsically matters is whether a light bulb is on or off, but I’m 60% sure that it should be on and 40% sure that it should be off. I’m given an infinite sequence of opportunities to flip the switch (and no opportunity to improve my estimates). What do I do?
There are 1000 people in the universe. I think my life is worth M of theirs, with the 10-log of M uniformly distributed from −3 to 3. I will be given the opportunity to either save my own life or 30 other people’s lives, but first I will be given the opportunity to either save 3 people’s lives or learn the exact value of M with certainty. What do I do?
Why spend only half on U1? Spend (1 - epsilon). And write a lottery ticket giving the U2-oriented decision maker the power with probability epsilon. Since epsilon infinity = infinity, you still get infinite expected* utility (according to U2). And you also get pretty close to the max possible according to U1.
Infinity has uses even beyond allocating hotel rooms. (HT to A. Hajek)
Of course, Hajek’s reasoning also makes it difficult to locate exactly what it is that U2 “says you should do”.
In general, it should be impossible to allocate 0 to U2 in this sense. What’s the probability that an angel comes down and magically forces you to do the U2 decision? Around epsilon, i’d say.
U2 then becomes totally meaningless, and we are back with a bounded utility function.
That can’t be right. What if U1 says you ought to buy an Xbox, then U2 says you ought to throw it away? Looks like a waste of resources. To avoid such wastes, your behavior must be Bayesian-rational. That means it must be governed by a utility function U3. What U3 is defined by the parliamentary model? You say it’s not averaging, but it has to be some function defined in terms of U1 and U2.
We’ve discussed a similar problem proposed by Stuart on the mailing list and I believe I gave a good argument (on Jan 21, 2011) that U3 must be some linear combination of U1 and U2 if you want to have nice things like Pareto-optimality. All bargaining should be collapsed into the initial moment, and output the coefficients of the linear combination which never change from that point on.
Right, clearly what I said can’t be true for arbitrary U1 and U2, since there are obvious counterexamples. And I think you’re right that theoretically, bargaining just determines the coefficients of the linear combination of the two utility functions. But it seems hard to apply that theory in practice, whereas if U1 and U2 are largely independent and sublinear in resources, splitting resources between them equally (perhaps with some additional Pareto improvements to take care of any noticeable waste from pursuing two completely separate plans) seems like a fair solution that can be applied in practice.
(ETA side question: does your argument still work absent logical omniscience, for example if one learns additional logical facts after the initial bargaining? It seems like one might not necessarily want to stick with the original coefficients if they were negotiated based on an incomplete understanding of what outcomes are feasible, for example.)
My thoughts:
You do always get a linear combination.
I can’t tell what that combination is, which is odd. The non-smoothness is problematic. You run right up against the constraints—I don’t remember how to deal with this. Can you?
If you have N units of resources which can be devoted to either task A or task B, the ratios of resource used will be the ratio of votes.
I think it depends on what kind of contract you sign. So if I sign a contract that says “we decide according to this utility function” you get something different then a contract that says “We vote yes in these circumstances and no in those circumstances”. The second contract, you can renegotiate, and that can change the utility function.
ETA:
In the case where utility is linear in the set of decisions that go to each side, for any Pareto-optimal allocation that both parties prefer to the starting (random) alllocation, you can construct a set of prices that is consistent with that allocation. So you’re reduced to bargaining, which I guess means Nash arbitration.
I don’t know how to make decisions under logical uncertainty in general. But in our example I suppose you could try to phrase your uncertainty about logical facts you might learn in the future in Bayesian terms, and then factor it into the initial calculation.
These are surely really, really different things. Utilitarianism says to count people more-or-less equally. However, the sort of utility maximization that actually goes on in people’s heads typically results in people valuing their own existence vastly above that of everyone else. That is because they were built that way by evolution—which naturally favours egoism. So, their utility function says: “Me, me, me! I, me, mine!” This is not remotely like utilitarianism—which explains why utilitarians have such a hard time acting on their beliefs—they are wired up by nature to do something totally different.
Also, you probably should not say “instrumental to their own terminal values”. “Instrumental” in this context usually refers to “instrumental values”. Using it to mean something else is likely to mangle the reader’s mind.
So, I think about things like infinite ethics all the time, and it doesn’t seem to disturb me to the extent it does you. You might say, “My brain is set up such that I automatically feel a lot of tension/drama when I feel like I might be ignoring incredibly morally important things.” But it is unclear that this need be the case. I can’t imagine that the resulting strain is useful in the long run. Have you tried jumping up a meta-level, tried to understand and resolve whatever’s causing the strain? I try to think of it as moving in harmony with the Dao.
He is not alone. Consider this, for instance:
Utilitarianism is like a plague around here. Perhaps it is down to the founder effect.
We do in fact want to save worlds we can’t begin to fathom from dangers we can’t begin to fathom even if it makes us depressed or dead… but if you don’t get any satisfaction from saving the world, you might have a problem with selfishness.
That’s not what I meant. What I meant is the general problem you run into when you take this stuff to its extreme. You end up saving hypothetical beings with a very low probability. That means that you might very well save no being at all, if your model was bogus. I am aware that the number of beings saved often outweighs the low probability...but I am not particular confident in this line of reasoning, i.e. in the meta-level of thinking about how to maximize good deeds. That leads to all kind of crazy seeming stuff.
If it does, something almost definitely went wrong. Biases crept in somewhere between the risk assessment, the outside view correction process, the policy-proposing process, the policy-analyzing process, the policy outside view correction process, the ethical injunction check, and the “(anonymously) ask a few smart people whether some part of this is crazy” step. I’m not just adding unnatural steps; each of those should be separate, and each of those is a place where error can throw everything off. Overconfidence plus conjunction fallacy equals crazy seeming stuff. And this coming from the guy who is all about taking ideas seriously.
I don’t feel there is a need for that. You just present these things as tools, not fundamental ideas, also discussing why they are not fundamental and why figuring out fundamental ideas is important. The relevant lesson is along the lines of Fake Utility Functions (the post has “utility function” in it, but it doesn’t seem to need to), applied more broadly to epistemology.
Thinking of Bayesianism as fundamental is what made some people (e.g., at least Eliezer and me) think that fundamental ideas exist and are important. (Does that mean we ought to rethink whether fundamental ideas exist and are important?) From Eliezer’s My Bayesian Enlightenment:
(Besides, even if your suggestion is feasible, somebody would have to rewrite a great deal of Eliezer’s material to not present Bayesianism as fundamental.)
The ideas of Bayesian credence levels and maximum entropy priors are important epistemic tools that in particular allow you to understand that those kludgy AI tools won’t get you what you want.
(It doesn’t matter for the normative judgment, but I guess that’s why you wrote this in parentheses.)
I don’t think Eliezer misused the idea in the sequences, as Bayesian way of thinking is a very important tool that must be mastered to understand many important arguments. And I guess at this point we are arguing about the sense of “fundamental”.
Agreed, but what I’m mostly griping about is when people who know that utility functions are a really inaccurate model still go ahead and use it, even if prefaced by some number of standard caveats. “Goal system”, for example, conveys a similar abstract idea without all of the questionable and misleading technical baggage (let alone associations with “utilitarianism”), and is more amenable to case-specific caveats. I don’t think we should downvote people for talking about utility functions, especially if they’re newcomers, but there’s a point at which we have to adopt generally higher standards for which concepts we give low K complexity in our language.
I have a vested interested in this. All of the most interesting meta-ethics and related decision theory I’ve seen thus far has come from people associated with SingInst or Less Wrong. If we are to continue to be a gathering place for that kind of mind we can’t let our standards degenerate, and ideally we should be aiming for improvement. From far away it would be way easy to dismiss Less Wrong as full of naive nerds completely ignorant of both philosophy and psychology. From up close it would be easy to dismiss Less Wrong as overly confident in a suspiciously homogeneous set of philosophically questionable meta-ethical beliefs, e.g. some form of utilitarianism. The effects of such appearances are hard to calculate and I think larger than most might intuit. (The extent to which well-meaning folk of an ideology very influenced by Kurzweil have poisoned the well for epistemic-hygienic or technical discussion of technological singularity scenarios, for instance, seems both very large and very saddening.)
What is giving this appearance? We have plenty of vocal commenters who are against utilitarianism, top-level posts pointing out problems in utilitarianism, and very few people actually defending utilitarianism. I really don’t get it. (BTW, utilitarianism is usually considered normative ethics, not metaethics.)
Also, utility function != utilitarianism. The fact that some people get confused about this is not a particularly good (additional) reason to stop talking about utility functions.
Here is someone just in this thread who apparently confuses EU-maxing with utilitarianism and apparently thinks that Less Wrong generally advocates utilitarianism. I’ll ask XiXiDu what gave him these impressions, that might tell us something.
ETA: The following comment is outdated. I had a gchat conversation with Wei Dai in which he kindly pointed out some ways in which my intended message could easily and justifiably have interpreted as a much stronger claim. I’ll add a note to my top level comment warning about this.
I never proposed that people stop talking about utility functions, and twice now I’ve described the phenomenon that I’m actually complaining about. Are you trying to address some deeper point you think is implicit in my argument, are you predicting how other people will interpret my argument and arguing against that interpreted version, or what? I may be wrong, but I think it is vitally important for epistemic hygiene that we at least listen to and ideally respond to what others are actually saying. You’re an excellent thinker and seemingly less prone to social biases than most so I am confused by your responses. Am I being dense somehow?
(ETA: The following hypothesis is obviously absurd. Blame it on rationalization. It’s very rare I get to catch myself so explicitly in the act! w00t!) Anyway, the people I have in mind don’t get confused about the difference between reasoning about/with utility functions and being utilitarian, they just take the former as strong evidence as of the latter. This doesn’t happen when “utility function” is used technically or in a sand-boxed way, only when it is used in the specific way that I was objecting to. Notice how I said we should be careful about which concepts we use, not which words.
I don’t really get it either. It seems that standard Less Wrong moral philosophy can be seen at some level of abstraction as a divergence from utilitarianism, e.g. because of apparently widespread consequentialism and focus on decision theory. But yeah, you’d think the many disavowments of utilitarianism would have done more to dispel the notion. Does your impression agree with mine though that it seems that many people think Less Wrong is largely utilitarian?
I desperately want a word that covers the space I want to cover that doesn’t pattern match to incorrect/fuzzy thing. (E.g. I think it is important to remember that one’s standard moral beliefs can have an interesting implicit structure at the ethical/metaethical levels, vice versa, et cetera.) Sometimes I use “shouldness” or “morality” but those are either misleading or awkward depending on context. Are there obvious alternatives I’m missing? I used “moral philosophy” above but I’m pretty sure that’s also straight-up incorrect. Epistemology of morality is clunky and probably means something else.
Why would you want to stop people talking about human utility functions?!? People should not build economic models of humans? How are such things supposedly misleading? You are concerned people will drag in too much from Von Neumann and Morgenstern? What gives?
By contrast, the idea that humans don’t have utility functions seems to be mysterian nonsense. What sense can be made out of that idea?
As I see it, humans have revealed behavioral tendencies and reflected preferences. I share your reservations about “revealed preferences”, which if they differ from both would have to mean something in between. Maybe revealed preferences would be what’s left after reflection to fix means-ends mistakes but not other reflection, if that makes sense. But when is that concept useful? If you’re going to reflect on means-ends, why not reflect all the way?
Also note that the preferences someone reveals through programming them into a transhuman AI may be vastly different from the preferences someone reveals through other sorts of behavior. My impression is that many people who talk about “revealed preferences” probably wouldn’t count the former as authentic revealed preferences, so they’re privileging behavior that isn’t too verbally mediated, or something. I wonder if this attributing revealed preference to a person rather than a person-situation pair should set off fundamental attribution error alarms.
If we have nothing to go by except behavior, it seems like it’s underdetermined whether we should say it’s preferences or beliefs (aliefs) or akrasia that’s being revealed, given that these factors determine behavior jointly and that we’re defining them by their effects. With reflected preferences it seems like you can at least ask the person which one of these factors they identify as having caused their behavior.
Good plausible hypothesis to cache for future priming, but I’m not sure I fully understand it:
More specifically, what process are you envisioning here (or think others might be envisioning)?
We might make something someday that isn’t godshatter, and we need to practice.
I agree that reforming humans to be rational is hopeless, but it is nevertheless useful to imagine how a rational being would deal with things.
But VNM utility is just one particularly unintuitive property of rational agents. (For instance, I would never ever use a utility function to represent the values of an AGI.) Surely we can talk about rational agents in other ways that are not so confusing?
Also, I don’t think VNM utility takes into account things like bounded computational resources, although I could be wrong. Either way, just because something is mathematically proven to exist doesn’t mean that we should have to use it.
Who is sure? If you’re saying that, I hope you are. What do you propose?
I don’t think anybody advocated what you’re arguing against there.
The nearest thing I’m willing to argue for is that one of the following possibilities hold:
We use something that has been mathematically proven to exist, now.
We might be speaking nonsense, depending on whether the concepts we’re using can be mathematically proven to make sense in the future.
Since even irrational agents can be modelled using a utility function, no “reforming” is needed.
How can they be modeled with a utility function?
As explained here:
Thanks for the reference.
It seems though that the reward function might be extremely complicated in general (in fact I suspect that this paper can be used to show that the reward function can be potentially uncomputable).
The whole universe may well be computable—according to the Church–Turing–Deutsch principle. If it isn’t the above analysis may not apply.
I agree with jsteinhardt, thanks for the reference.
I agree that the reward functions will vary in complexity. If you do the usual thing in Solomonoff induction, where the plausibility of a reward function decreases exponentially with its size, so far as I can tell you can infer reward fuctions from behavior, if you can infer behavior.
We need to infer a utility function for somebody if we’re going to help them get what they want, since a utility function is the only reasonable description I know of what an agent wants.
It was my impression that it was LW orthodoxy that at “reflective equilibrium”, the values and preferences of rational humans can be represented by a utility function. That is:
… if we or our AI surrogate ever reach that point, then humans have a utility function that captures what we want morally and hedonistically. Or so I understand it.
Yes, our current god-shatter-derived inconsistent values can not be described by a utility function, even as an abstraction. But it seems to me that most of the time what we are actually talking about is what our values ought to be rather than what they are. So, I don’t think that a utility function is a ridiculous abstraction—particularly for folk who strive to be rational.
Actually, yes they can. Any computable agent’s values can be represented by a utility function. That’s one of the good things about modelling using utility functions—they can represent any agent. For details, see here:
Nope. Humans do have utility functions—in this sense:
Any computable agent has a utility function. That’s the beauty of using a general theory.
A trivial sense, that merely labels what an agent does with 1 and what it doesn’t with 0: the Texas Sharpshooter Utility Function. A “utility function” that can only be calculated—even by the agent itself—in hindsight is not a utility function. The agent is not using it to make choices and no observer can use it to make predictions about the agent.
Curiously, in what appears to be a more recent version of the paper, the TSUF is not included.
Er, the idea is that you can make a utility-maximising model of the agent—using the specified utility function—that does the same things the agent does if you put it in the same environment.
Can people please stop dissing the concept of a human utility function. Correcting these people is getting tedious—and I don’t want to be boring.
Doesn’t work. The Texas Sharpshooter utility function described by Dewar cannot be used to make a utility-maximising model of the agent, except by putting a copy of the actual agent into the box, seeing what it does, declaring that to have utility 1, and doing it. The step of declaring it to have utility 1 plays no role in deciding the actions. It is a uselessly spinning cog doing no more work than a suggestive name on a Lisp symbol.
I was thinking a similar thought about you. You’re the only person here that I’ve seen taking these trivial utility functions seriously.
The idea here is that—if the agent is computable—then it can be simulated by any other computable system. So, if the map between its inputs and state, and its motor output is computable then we can make another computable system which produces the same map—since all universal computing systems can simulate each other by virtue of being Turing complete (and systems made of e.g. partial recursive functions can simulate each other too—if they are given enough memory to do so).
I mentioned computability at the top, by saying: “any computable agent has a utility function”.
As far as anyone can tell, the whole universe is computable.
I don’t see how this bears on the possibility of modelling every agent by a utility-maximising agent. Dewar’s construction doesn’t work. Its simulation of an agent by a utility-maximising agent just uses the agent to simulate itself and attaches the label “utility=1” to its actions.
Dewey says pretty plainly: “any agents can be written in O-maximizer form”.
O-maximisers are just plain old utility maximisers. Dewey rechristens them “Observation-Utility Maximizers” in his reworked paper.
He makes an O-maximiser from an agent, A. Once you have the corresponding O-maximiser, the agent A could be discarded.
I know that he says that. I am saying, I thought pretty plainly, that I disagree with him.
He only does that in the earlier paper. His construction is as I described it: define O as doing whatever A does and label the result with utility 1. A is a part of O and cannot be discarded. He even calls this construction trivial himself, but underrates its triviality.
I don’t really understand which problem you are raising. If the O eventually contains a simulated copy of A—so what? O is still a utililty-maximiser that behaves the same way that A does if placed in the same environment.
The idea of a utility maximiser as used here is that it assigns utilities to all its possible actions and then chooses the action with the highest utility. O does that—so it qualifies as a utililty-maximiser.
O doesn’t assign utilities to its actions and then choose the best. It chooses its action (by simulating A), labels it with utility 1, and chooses to perform the action it just chose. The last two steps are irrelevant.
“Irrelevant”? If it didin’t perform those steps, it wouldn’t be a utility maximiser, and then the proof that you can build a utility maximiser which behaves like any computable agent wouldn’t go through. Those steps are an important part of the reason for exhibiting this construction in the first place.
I think that everyone understands the point you’re trying to make—you can usefully model people as having a utility function in a wide variety of cases—but very often people use such models unskillfully, and it causes people like me to facepalm. If you want to model a lot of humans, for instance, it’s simple and decently accurate to model them as having utility functions. Economics, say. And if you have something like AIXI, or as Dawkins might argue a gene, then a utility function isn’t even a model, it’s right there in front of you.
I hypothesize that the real trouble starts when a person confuses the two; he sees or imagines a Far model of humans with utility functions, zooms in on an individual human or zooms in on himself, and thinks he can see the real utility function sitting right there in front of him, like he could with AIXI. Yeah, he knows in the abstract that he doesn’t have direct access to it, but it feels Near. This can lead to a lot of confusion, and it leads people like me to think folk shouldn’t talk about a person’s “utility function” except in cases where it obviously applies.
Even where you can say “Person A has a utility function that assigns 4 utility to getting cheesecake and 2 utility to getting paperlips”, why not say “Agent A”? But that’s not what I facepalm at. I only facepalm when people say they got their “utility function” from natural selection (i.e. ignoring memes), or say they wish they could modify their utility function, et cetera. In many cases it works as an abstraction, but if you’re not at all thinking about EU, why not talk directly about your preferences/values? It’s simpler and less misleading.
This seems like a bit of a different issue—and one that I am not so interested in.
A couple of comments about your examples, though:
For someone like me it is pretty accurate to say that I got my utility function from natural selection acting on DNA genes. Memes influence me, but I try not to let them influnce my goals. I regard them as symbiotes: mutualists and pathogens. In principle they could do deals with me that might make me change my goals—but currently I have a powerful bargaining position, their bargaining position is typically weak—and so I just get my way. They don’t get to affect my goals. Those that try get rejected by my memetic immune system. I do not want to become the victim of a memetic hijacking.
As for the implied idea that natural selection does not apply to memes, I’ll try to bite my tongue there.
That seems closely equivalent to me. The cases where people talk about utility functions are mostly those where you want to compare with machines, or conjour up the idea of an expected utility maximiser for some reason. Sometimes even having “utility” in the context is enough for the conversation to wander on to utility functions.
My council would be something like: “Don’t like it? Get used to it!” There is not, in fact, anything wrong with it.
That totally wasn’t what I meant to imply. I am definitely a universal Darwinist. (You can view pretty much any optimization process as “evolution”, though, so in some cases it’s questionably useful. Bayesian updating is just like population genetics. But with memes it’s obviously a good description.)
Yes, but I think you’re rather unusual in this regard; most people aren’t so wary of memes. Might I ask why you prefer genes to memes? This seems odd to me. Largely because humans evolved for memes and with memes. Archetypes, for example. But also because the better memes seem to have done a lot of good in the world. (My genetically evolved cognitive algorithms—that is, the algorithms in my brain that I think aren’t the result of culture, but instead are universal machinery—stare in appreciation at the beauty of cathedrals, and are grateful that economies make my life easier.)
’s why I tried to bite my tongue—but it was difficult to completely let it go by...
Well, I love memes, but DNA-genes built 99% of my ancestors unassisted, and are mostly responsible for building me. They apparently equipped me with a memetic immune system, for weeding out undesirable memes, to allow me to defend myself in those cases where there is a conflict of interests.
Why should I side with the memes? They aren’t even related to me. The best of them are beneficial human symbionts—rather like lettuces and strawberries. I care for them some—but don’t exactly embrace their optimisation targets as my own.
I don’t dispute memes have done a lot of good things in the world. So has Mother Teresa—but that doesn’t mean I have to adopt her goals as my own either.
I know what I want based on naive introspection. If you want to have preferences other than those based on naive introspection, then one of your preferences, based on naive introspection, is not to have preferences that are based on naive introspection. I am not sure how you think you could ever get around intuition, can you please elaborate?
Naive introspection is an epistemic process; it’s one kind of algorithm you can run to figure out aspects of the world, in this case your mind. Because it’s an epistemic process we know that there are many, many ways it can be suboptimal. (Cognitive biases come to mind, of course; Robin Hanson writes a lot about how naive introspection and actual reasons are very divergent. But sheer boundedness is also a consideration; we’re just not very good Bayesians.) Thus, when you say “one of your preferences, based on naive introspection, is not to have preferences that are based on naive introspection,” I think:
If my values are what I think they are,
I desire to believe that my values are what I think they are;
If my values aren’t what I think they are,
I desire to believe that my values aren’t what I think they are;
Let me not become attached to values that may not be.
Agree completely. (Even though I am guilty of using the word myself below.) But most of this post seems to be based on linearity of preference, which imho can usually only be justified by muddling around with utilities. So maybe that is the place to start?
EDIT: To clarify, I mean that maybe the reason to reject Person 1′s argument is because it implicitly appeals to notions of utility when claiming you should maximize expected DALYs.
I agree with most of what you say here; is your comment referring to my post and if so which part?
Not referring to your post, no, just some aspects of some of the comments on it and the memetic ecology that enables those aspects. I’ll add a meta tag to my comment to make this clearer.
Because rational agents care about whatever the hell they want to care about. I, personally, choose to care about my abstract ‘utility function’ with the clear implication that said utility function is something that must be messily constructed from godshatter preferences. And that’s ok because it is what I want to want.
No. It is a useful abstraction. Not using utility function measures does not appear to improve abstract decision making processes. I’m going to stick with it.
Eliezer’s original quote was better. Wasn’t it about superintelligences? Anyway you are not a superintelligence or a rational agent and therefore have not yet earned the right to want to want whatever you think you want to want. Then again I don’t have the right to deny rights so whatever.
I wasn’t quoting Eliezer, I made (and stand by) a plain English claim. It does happen to be a similar in form to a recent instance of Eliezer summarily rejecting PhilGoetz declaration that rationalists don’t care about the future. That quote from Eliezer was about “expected-utility-maximising agents” which would make the quote rather inappropriate in the context.
I will actually strengthen my declaration to:
Because agents can care about whatever the hell they want to care about. (This too should be uncontroversial.)
An agent does not determine its preferences by mere vocalisation and nor does its belief about its preference intrinsically make them so. Nevertheless I do care about my utility function (with the vaguely specified caveats). If you could suggest a formalization sufficiently useful for decision making that I could care about it even more than my utility function then I would do so. But you cannot.
No, you don’t. The only way you could apply limits on what I want is via physically altering my molecular makeup. As well as being rather difficult for you to do on any significant scale I could credibly claim that the new physical configuration you constructed from my atoms is other than ‘me’. You can’t get much more of a fundamental destruction of identity than by changing what an agent wants.
I don’t object to you declaring that you don’t have or don’t want to have a utility function. That’s your problem not mine. But I will certainly object to any interventions made that deny that others may have them.