Model Uncertainty, Pascalian Reasoning and Utilitarianism
Related to: Confidence levels inside and outside an argument, Making your explicit reasoning trustworthy
A mode of reasoning that sometimes comes up in discussion of existential risk is the following.
Person 1: According to model A (e.g. some Fermi calculation with probabilities coming from certain reference classes), pursuing course of action X will reduce existential risk by 10-5 ; existential risk has an opportunity cost of 1025 DALYs (*), therefore model A says the expected value of pursuing course of action X is 1020 DALYs. Since course of action X requires 109 dollars, the number of DALYs saved per dollar invested in course of action X is 1011. Hence course of action X is 1010 times as cost-effective as the most cost-effective health interventions in the developing world.
Person 2: I reject model A; I think that appropriate probabilities involved in the Fermi calculation may be much smaller than model A claims; I think that model A fails to incorporate many relevant hypotheticals which would drag the probability down still further.
Person 1: Sure, it may be that model A is totally wrong, but there’s nothing obviously very wrong with it. Surely you’d assign at least a 10-5 chance that it’s on the mark? More confidence than this would seem to indicate overconfidence bias, after all, plenty of smart people believe in model A and it can’t be that likely that they’re all wrong. So you think that the side-effects of pursuing course of action X are systematically negative, even your own implicit model gives a figure of at least 105 $/DALY saved, and that’s a far better investment than any other philanthropic effort that you know of, so you should fund course of action X even if you think that model A is probably wrong.
(*) As Jonathan Graehl mentions, DALY stands for Disability-adjusted life year.
I feel very uncomfortable with this sort of argument that Person 1 advances above. My best attempt at an summary of where my discomfort comes from is that it seems like one could make the sort of argument to advance a whole number of courses of action, many of which would be at odds with one another.
I have difficulty parsing where my discomfort comes from in more detail. There may be underlying game-theoretic considerations, there may be underlying considerations based on the anthropic principle, it could be that the probability that one ascribes to model A being correct should be much lower than 10-5 on account of humans’ poor ability to construct accurate models and that I shouldn’t take it too seriously when some people ascribe to them, it could be that I’m irrationally influenced by social pressures against accepting unusual arguments that most people wouldn’t feel comfortable accepting, it could be that in such extreme situations I value certainty over utility maximization, it could be some combination of all of these; I’m not sure how to disentangle the relevant issues in my mind.
One case study that I think may be useful to consider in juxtaposition with the above is as follows. In Creating Infinite Suffering: Lab Universes Alan Dawrst says
Abstract. I think there’s a small but non-negligible probability that humans or their descendants will create infinitely many new universes in a laboratory. Under weak assumptions, this would entail the creation of infinitely many sentient organisms. Many of those organisms would be small and short-lived, and their lives in the wild would often involve far more pain than happiness. Given the seriousness of suffering, I conclude that creating infinitely many universes would be infinitely bad.
One may not share Dawrst’s intuition that pain would outweigh happiness in such universes, but regardless, the hypothetical of lab universes raises the possibility that all of the philanthropy that one engages in with a view toward utility maximizing should be focusing around creating or preventing the creation of infinitely many lab universes (according to whether or not one one views the expected value of such a universe as positive or negative). This example is in the spirit of Pascal’s wager but I prefer it because the premises are less metaphysically dubious.
One can argue that if one is willing to accept the argument given by Person 1 above, one should be willing to accept the argument that one should devote all of one’s resources to studying and working toward or against lab universes.
Here various attempts at counterarguments seem to be uncompelling:
Counterargument #1: The issue here is with the infinite; we should ignore infinite ethics on the grounds that they’re beyond the range of human comprehension and focus on finite ethics.
Response: The issue here doesn’t seem to be with infinities, one can replace “infinitely many lab universes” with “3^^^3 lab universes” (or a sufficiently large number) and would be faced with essentially the same conundrum.
Counterargument #2: The hypothetical upside of a lab universe perfectly cancels out the hypothetical downside of such a universe so we can lab universes as having expected value zero.
Response: If this is true it’s certainly not obviously true; there are physical constraints on the sorts of lab universes that could arise, it’s probably not the case that for every universe there’s an equal and opposite universe. Moreover, it’s not the case that we don’t have a means of investigating the expected utility of a lab universe. We do have our own universe as a model, can contemplate whether it has aggregate positive or negative utility and refine this understanding by researching fundamental physics, hypothesizing the variation among initial conditions and physical laws among lab universes and attempting to extrapolate what the utility/disutility of an average such universe would be.
Counterargument #3: Even if one’s focus should be on lab universes, such a focus reduces to a focus on creating a Friendly AI, such an entity would be much better than us at reasoning about whether or not lab universes are a good thing and how to go about affecting their creation.
Response: Here too, if this is true it’s not obvious. Even if one succeeds in creating an AGI that’s sympathetic to human values, such an AGI may not ascribe to utilitarianism, after all many humans aren’t and it’s not clear that this is because their volitions have not been coherently extrapolated; maybe some humans have volitions which coherently extrapolate to being heavily utilitarian whereas others don’t. If one is in the latter category, one may do better to focus on lab universes than one would do in focusing on FAI (for example, if one believes that lab universes would have average negative utility, one might work to increase existential risk so as to avert the possibility that a nonutilitarian FAI creates infinitely many universes in a lab because some people find it cool.
Counterargument #4: The universes so created would be parallel universes and parallel copies of a given organism should be considered equivalent to a single such organism, thus their total utility is finite and the expected utility of creating a lab universe is smaller than the expected utility in our own universe.
Response: Regardless of whether one considers parallel copies of a given organism equivalent to a single organism, there’s some nonzero chance that the universes created would diverge in a huge number of ways; this could make the expected value of the creation of universes arbitrarily large depending how the probability that one assigns to the creation of n essentially distinct universes varies with n (this is partially an empirical/mathematical question; I’m not claiming that the answer goes one way or the other).
Counterargument #5: The statement “creating infinitely many universes would be infinitely bad” is misleading; as humans we experience marginal diminishing utility with respect to helping n sentient beings as n varies, this is not exclusively due to scope insensitivity, rather, the concavity of the function at least partially reflects terminal values.
Response: Even if one decides that this is true, one still has a question of how quickly the marginal diminishing utility sets in; and any choice here seems somewhat arbitrary so this line of reasoning seems unsatisfactory. Depending on the choice that one makes; one may reject Person 1′s argument on the grounds that after a certain point one just doesn’t care very much about helping additional people.
I’ll end with a couple of questions for Less Wrong:
1. Is the suggestion that one’s utilitarian efforts should be primarily focused on the possibility of lab universes an example of “explicit reasoning gone nuts?” (c.f. Anna’s post Making your explicit reasoning trustworthy).
2. If so, is the argument advanced by Person 1 above also an example of “explicit reasoning gone nuts?” If the two cases are different then why?
3. If one rejects one or both of the argument by Person 1 and the argument that utilitarian efforts should be focused around lab universes, how does one reconcile this with the idea that one should assign some probability to the notion that one’s model is wrong (or that somebody else’s model is right)?
- Question about Large Utilities and Low Probabilities by 24 Jun 2011 18:33 UTC; 8 points) (
- 6 May 2019 7:23 UTC; 4 points) 's comment on Coherence arguments do not entail goal-directed behavior by (
- 27 Mar 2012 8:28 UTC; 0 points) 's comment on Fundamentals of kicking anthropic butt by (
The more I think about it, the more I’m tempted to just bite the bullet and accept that my “empirically observed utility function” (to the degree that such a thing even makes sense) may be bounded, finite, with a lot of its variation spent measuring relatively local things like the prosaic well being of myself and my loved ones, so that there just isn’t much left over to cover anyone outside my monkey sphere except via a generic virtue-ethical term for “being a good citizen n’stuff”.
A first order approximation might be mathematically modeled by taking all the various utilities having to do with “weird infinite utilities”, normalizing all those scenarios by “my ability to affect those outcomes” (so my intrinsic concern for things decreased when I “gave up” on affecting them… which seems broken but also sorta seems like how things might actually work) and then run what’s left through a sigmoid function so their impact on my happiness and behavior is finite and marginal… claiming maybe 1% of my consciously strategic planning time and resource expenditures under normal circumstances.
Under this model, the real meat of my utility function would actually be characterized by the finite number of sigmoid terms that sum together, what each one handles, and the multiplicative factors attached to each term. All the weird “infinity arguments” are probably handled by a generic term for “other issues” that is already handling political tragedies playing out on different continents and the ongoing mass extinction event and looming singularity issues and so on. In comparison, this scheme would need quite a few terms for things like “regular bowel movements”, that are usually near optimal and have multiplicative factors such that any of them can dominate pretty much the entire utility function if anything goes wrong in these domains.
Spelling this out as “a possible description of how my ‘utility function’ actually works” it occurs to me to wonder how far from optimal an agent that was built to work this way would be?
...
“No one can save the world with an upset tummy!”
Compare and contrast from Adam Smith’s Theory of Moral Sentiments: “And when all this fine philosophy was over, when all these humane sentiments had been once fairly expressed, he would pursue his business or his pleasure, take his repose or his diversion, with the same ease and tranquillity, as if no such accident had happened. The most frivolous disaster which could befall himself would occasion a more real disturbance. If he was to lose his little finger to-morrow, he would not sleep to-night; but, provided he never saw them, he will snore with the most profound security over the ruin of a hundred millions of his brethren, and the destruction of that immense multitude seems plainly an object less interesting to him, than this paltry misfortune of his own.”
I largely buy the framework of this comment, as I’ve said elsewhere.
It does still leave the question of how to go about “being a good citizen n’stuff” with the limited portion of your efforts you want to invest in doing so. Most of multifoliaterose’s questions could be reframed in those terms.
Thanks for your thoughtful comment.
I agree that it’s unclear that it makes sense to talk about humans having utility functions; my use of the term was more a manner of speaking than anything else.
It sounds like you’re going with something like Counterargument #5 with something like the Dunbar number determining the point at which your concern for others caps off; this augmented by some desire to “be a good citizen n’stuff”.
Something similar may be true of me, but I’m not sure. I know that I derive a lot of satisfaction from feeling like I’m making the world a better place and am uncomfortable with the idea that I don’t care about people who I don’t know (in light of my abstract belief in space and time independence of moral value); but maybe the intensity of the relevant feelings are sufficiently diminished when the magnitude of uncertainty becomes huge so that other interests predominate.
I feel like if I could prove that course X maximizes expected utility then my interest in pursuing course X would increase dramatically (independently of how small the probabilities are and of the possibility of doing more harm than good) but that having a distinctive sense that I’ll probably change my mind about whether pursuing course X was a good idea significantly decreases my interest in pursuing course X. Finding it difficult to determine whether this reflects my “utility function” or whether there’s a logical argument coming from utilitarianism against pursuing courses that one will probably regret (e.g. probable burnout and disillusionment repelling potentially utilitarian bystanders).
Great Adam Smith quotation; I’ve seen it before, but it’s good to have a reference.
Obligatory OB link: Bostrom and Ord’s parliamentary model for normative uncertainty/mixed motivations.
They do have them—in this sense:
I think the use of both DALYs and dollars in the main article is worth talking about, in context of some of the things you have mentioned. Being a stupid human, I find that it is generally useful for me to express utility to myself in dollars, because I possess a pragmatic faculty for thinking about dollars. I might not bend over to pick up one dollar. I might spend a couple of hours working for $100. There isn’t much difference between one billion and two billion dollars, from my current perspective.
When you ask me how many dollars I would spend to avert the deaths of a million people, the answer can’t be any larger than the amount of dollars I actually have. If you ask me how many dollars I would spend to avoid the suffering associated with a root canal, it could be some noticeable percentage of my net worth.
When we start talking about decisions where thousands of DALYs hang in the balance, my monkey brain has no intuitive sense of the scope of this, and no pragmatic way of engaging with it. I don’t have the resources or power to purchase even one DALY-equivalent under my own valuation!
If the net utility of the universe is actually being largely controlled by infinitesimal probabilities of enormous utilities, then my sense of scale for both risk and value is irrelevant. It hardly matters how many utilons I attribute to a million starving people when I have only so much time and so much money.
I don’t know what, if anything, to conclude from this, except to say that it makes me feel unsuited to reasoning about anything outside the narrow human scope of likelihoods and outcomes.
ETA: This is a meta comment about some aspects of some comments on this post and what I perceive to be problems with the sort of communication/thinking that leads to the continued existence of those aspects. This comment is not meant to be taken as a critique of the original post.
ETA2: This comment lacks enough concreteness to act as a serious consideration in favor of one policy over another. Please disregard it as a suggestion for how LW should normatively respond to something. Instead one might consider if one might personally benefit from enacting a policy I might be suggesting, on an individual basis.
Why are people on Less Wrong still talking about ‘their’ ‘values’ using deviations from a model that assumes they have a ‘utility function’? It’s not enough to explicitly believe and disclaim that this is obviously an incorrect model, at some point you have to actually stop using the model and adopt something else. People are godshatter, they are incoherent, they are inconsistent, they are an abstraction, they are confused about morality, their revealed preferences aren’t their preferences, their revealed preferences aren’t even their revealed preferences, their verbally expressed preferences aren’t even preferences, the beliefs of parts of them about the preferences of other parts of them aren’t their preferences, the beliefs of parts of them aren’t even beliefs, preferences aren’t morality, predisposition isn’t justification, et cetera…
Can we please avoid using the concept of a human “utility function” even as an abstraction, unless it obviously makes sense to do so? If you’re specific enough and careful enough it can work out okay (e.g. see JenniferRM’s comment) but generally it is just a bad idea. Am I wrong to think this is both obviously and non-obviously misleading in a multitude of ways?
Don’t you think people need to go through an “ah ha, there is such a thing as rationality, and it involves Bayesian updating and expected utility maximization” phase before moving on to “whoops, actually we don’t really know what rationality is and humans don’t seem to have utility functions”? I don’t see how you can get people to stop talking about human utility functions unless you close LW off from newcomers.
I was pretty happy before LW, until I learnt about utility maximization. It tells me that I ought to do what I don’t want to do on any other than some highly abstract intellectual level. I don’t even get the smallest bit of satisfaction out of it, just depression.
Saving galactic civilizations from superhuman monsters burning the cosmic commons, walking into death camps as to reduce the likelihood of being blackmailed, discounting people by the length of their address in the multiverse...taking all that seriously and keeping one’s sanity, that’s difficult for some people.
What LW means by ‘rationality’ is to win in a hard to grasp sense that is often completely detached from the happiness and desires of the individual.
If this is really having that effect on you, why not just focus on things other than abstract large-scale ethical dilemmas, e.g. education, career, relationships? Progress on those fronts is likely to make you happier, and if you want to come back to mind-bending ethical conundrums you’ll then be able to do so in a more productive and pleasant way. Trying to do something you’re depressed and conflicted about is likely to be ineffective or backfire.
Yeah, I have found that when my mind breaks, I have to relax while it heals before I can engage it in the same sort of vigorous exercise again.
It’s important to remember that that’s what is going on. When you become overloaded and concentrate on other things, you are not neglecting your duty. Your mind needs time to heal and become stronger by processing the new information you’ve given it.
Not necessarily, sometimes people are doing exactly that, depending on what you mean by “overloaded”.
Hmm… I think I’ve slipped into “defending a thesis” mode here. The truth is that the comment you replied to was much too broad, and incorrect as stated, as you correctly pointed out. Thanks for catching my error!
You are right, it depends on the specifics. And if you focus on other things with no plan to ever return to the topic that troubled you, that’s different. But if you’ve learned things that make demands on your mind beyond what it can meet, then failing to do what is in fact impossible for you is not negligence.
Gosh, recurring to jsteinhart’s comment everything should add up to normality . If you feel that you’re being led by abstract reasoning in directions that feel consistently feel wrong then there’s probably something wrong with the reasoning. My own interest in existential risk reduction is that when I experience a sublime moment I want people to be able to have more of them for a long time. If all that there was was a counterintuitive abstract argument I would think about other things.
Yup, my confidence in the reasoning here on LW and my own ability to judge it is very low. The main reason for this is described in your post above, taken to its logical extreme you end up doing seemingly crazy stuff like trying to stop people from creating baby universes rather than solving friendly AI.
I don’t know how to deal with this. Where do I draw the line? What are the upper and lower bounds? Are risks from AI above or below the line of uncertainty that I better ignore, given my own uncertainty and the uncertainty in the meta-level reasoning involved?
I am too uneducated and probably not smart enough to figure this out, yet I face the problems that people who are much more educated and intelligent than me devised.
If a line of reasoning is leading you to do something crazy, then that line of reasoning is probably incorrect. I think that is where you should draw the line. If the reasoning is actually correct, then by learning more your intuitions will automatically fall in line with the reasoning and it will not seem crazy anymore.
In this case, I think your intuition correctly diagnoses the conclusion as crazy. Whether you are well-educated or not, the fact that you can tell the difference speaks well of you, although I think you are causing yourself way too much anxiety by worrying about whether you should accept the conclusion after all. Like I said, by learning more you will decrease the inferential distance you will have to traverse in such arguments, and better deduce whether they are valid.
That being said, I still reject these sorts of existential risk arguments based mostly on intuition, plus I am unwilling to do things with high probabilities of failure, no matter how good the situation would be in the event of success.
ETA: To clarify, I think existential risk reduction is a worthwhile goal, but I am uncomfortable with arguments advocating specific ways to reduce risk that rely on very abstract or low-probability scenarios.
There are many arguments in this thread that this extreme isn’t even correct given the questionable premises, have you read them? Regardless, though, it really is important to be psychologically realistic, even if you feel you “should” be out there debating with AI researchers or something. Leading a psychologically healthy life makes it a lot less likely you’ll have completely burnt yourself out 10 years down the line when things might be more important, and it also sends a good signal to other people that you can work towards bettering the world without being some seemingly religiously devout super nerd. One XiXiDu is good, two XiXiDus is a lot better, especially if they can cooperate, and especially if those two XiXiDus can convince more XiXiDus to be a little more reflective and a little less wasteful. Even if the singularity stuff ends up being total bullshit or if something with more “should”-ness shows up, folk like you can always pivot and make the world a better place using some other strategy. That’s the benefit of keeping a healthy mind.
[Edit] I share your discomfort but this is more a matter of the uncertainty intrinsic to the world than we live in than a matter of education/intelligence. At some point a leap of faith is required.
That’s not utility maximisation, that’s utilitarianism. A separate idea, though confusingly named.
IMHO, utilitarianism is a major screw-up for a human being. It is an unnatural philosophy which lacks family values and seems to be used mostly by human beings for purposes of signalling and manipulation.
Two things seem off. The first is that expected utility maximization isn’t the same thing as utilitarianism. Utility maximization can be done even if your utility function doesn’t care at all about utilitarian arguments, or is unimpressed by arguments in favor of scope sensitivity. But even after making that substitution, why do you think Less Wrong advocates utilitarianism? Many prominent posters have spoken out against it both for technical reasons and ethical ones. And arguments for EU maximization, no matter how convincing they are, aren’t at all related to arguments for utilitarianism. I understand what you’re getting at—Less Wrong as a whole seems to think there might be vitally important things going on in the background and you’d be silly to not think about them—but no one here is going to nod their head disapprovingly or shove math in your face if you say “I’m not comfortable acting from a state of such uncertainty”.
And I link to this article again and again these days, but it’s really worth reading: http://lesswrong.com/lw/uv/ends_dont_justify_means_among_humans/ . This doesn’t apply so much to epistemic arguments about whether risks are high or low, but it applies oh-so-much to courses of action that stem from those epistemic arguments.
The problem is that if I adopt unbounded utility maximization, then I perceive it to converge with utilitarianism. Even completely selfish values seem to converge with utilitarian motives. Not only does every human, however selfish, care about other humans, but they are also instrumental to their own terminal values.
Solving friendly AI means to survive. As long as you don’t expect to be able to overpower all other agents, by creating your own FOOMing AI, the best move is to play the altruism card and argue in favor of making an AI friendly_human.
Another important aspect is that it might be rational to treat copies of you, or agents with similar utility-functions (or ultimate preferences), as yourself (or at least assign non-negligible weight to them). One argument in favor of this is that the goals of rational agents with the same preferences will ultimately converge and are therefore instrumental in realizing what you want.
But even if you only care little about anything but near-term goals revealed to you by naive introspection, taking into account infinite (or nearly infinite, e.g. 3^^^^3) scenarios can easily outweigh those goals.
All in all, if you adopt unbounded utility maximization and you are not completely alien, you might very well end up pursuing utilitarian motives.
A real world example is my vegetarianism. I assign some weight to sub-human suffering, enough to outweigh the joy of eating meat. Yet I am willing to consume medical comforts that are a result of animal experimentation. I would also eat meat if I would otherwise die. Yet, if the suffering was big enough I would die even for sub-human beings, e.g. 3^^^^3 pigs being eaten. As a result, if I take into account infinite scenarios, my terminal values converge with that of someone subscribed to utilitarianism.
The problem, my problem, is that if all beings would think like this and sacrifice their own life’s, no being would end up maximizing utility. This is contradictory. One might argue that it is incredible unlikely to be in the position to influence so many other beings, and therefore devote some resources to selfish near-term values. But charities like the SIAI claim that I am in the position to influence enough beings to outweigh any other goals. At the end of the day I am left with the decision to either abandon unbounded utility maximization or indulge myself into the craziness of infinite ethics.
How about, for example, assigning .5 probability to a bounded utility function (U1), and .5 probability to an unbounded (or practically unbounded) utility function (U2)? You might object that taking the average of U1 and U2 still gives an unbounded utility function, but I think the right way to handle this kind of value uncertainty is by using a method like the one proposed by Bostrom and Ord, in which case you ought to end up spending roughly half of your time/resources on what U1 says you should do, and half on what U2 says you should do.
I haven’t studied all the discussions on the parliamentary model, but I’m finding it hard to understand what the implications are, and hard to judge how close to right it is. Maybe it would be enlightening if some of you who do understand the model took a shot at answering (or roughly approximating the answers to) some practice problems? I’m sure some of these are underspecified and anyone who wants to answer them should feel free to fill in details. Also, if it matters, feel free to answer as if I asked about mixed motivations rather than moral uncertainty:
I assign 50% probability to egoism and 50% to utilitarianism, and am going along splitting my resources about evenly between those two. Suddenly and completely unexpectedly, Omega shows up and cuts down my ability to affect my own happiness by a factor of one hundred trillion. Do I keep going along splitting my resources about evenly between egoism and utilitarianism?
I’m a Benthamite utilitarian but uncertain about the relative values of pleasure (measured in hedons, with a hedon calibrated as e.g. me eating a bowl of ice cream) and pain (measured in dolors, with a dolor calibrated as e.g. me slapping myself in the face). My probability distribution over the 10-log of the number of hedons that are equivalent to one dolor is normal with mean 2 and s.d. 2. Someone offers me the chance to undergo one dolor but get N hedons. For what N should I say yes?
I have a marshmallow in front of me. I’m 99% sure of a set of moral theories that all say I shouldn’t be eating it because of future negative consequences. However, I have this voice telling me that the only thing that matters in all the history of the universe is that I eat this exact marshmallow in the next exact minute and I assign 1% probability to it being right. What do I do?
I’m 80% sure that I should be utilitarian, 15% sure that I should be egoist, and 5% sure that all that matters is that egoism plays no part in my decision. I’m given a chance to save 100 lives at the price of my own. What do I do?
I’m 100% sure that the only thing that intrinsically matters is whether a light bulb is on or off, but I’m 60% sure that it should be on and 40% sure that it should be off. I’m given an infinite sequence of opportunities to flip the switch (and no opportunity to improve my estimates). What do I do?
There are 1000 people in the universe. I think my life is worth M of theirs, with the 10-log of M uniformly distributed from −3 to 3. I will be given the opportunity to either save my own life or 30 other people’s lives, but first I will be given the opportunity to either save 3 people’s lives or learn the exact value of M with certainty. What do I do?
Why spend only half on U1? Spend (1 - epsilon). And write a lottery ticket giving the U2-oriented decision maker the power with probability epsilon. Since epsilon infinity = infinity, you still get infinite expected* utility (according to U2). And you also get pretty close to the max possible according to U1.
Infinity has uses even beyond allocating hotel rooms. (HT to A. Hajek)
Of course, Hajek’s reasoning also makes it difficult to locate exactly what it is that U2 “says you should do”.
In general, it should be impossible to allocate 0 to U2 in this sense. What’s the probability that an angel comes down and magically forces you to do the U2 decision? Around epsilon, i’d say.
U2 then becomes totally meaningless, and we are back with a bounded utility function.
That can’t be right. What if U1 says you ought to buy an Xbox, then U2 says you ought to throw it away? Looks like a waste of resources. To avoid such wastes, your behavior must be Bayesian-rational. That means it must be governed by a utility function U3. What U3 is defined by the parliamentary model? You say it’s not averaging, but it has to be some function defined in terms of U1 and U2.
We’ve discussed a similar problem proposed by Stuart on the mailing list and I believe I gave a good argument (on Jan 21, 2011) that U3 must be some linear combination of U1 and U2 if you want to have nice things like Pareto-optimality. All bargaining should be collapsed into the initial moment, and output the coefficients of the linear combination which never change from that point on.
Right, clearly what I said can’t be true for arbitrary U1 and U2, since there are obvious counterexamples. And I think you’re right that theoretically, bargaining just determines the coefficients of the linear combination of the two utility functions. But it seems hard to apply that theory in practice, whereas if U1 and U2 are largely independent and sublinear in resources, splitting resources between them equally (perhaps with some additional Pareto improvements to take care of any noticeable waste from pursuing two completely separate plans) seems like a fair solution that can be applied in practice.
(ETA side question: does your argument still work absent logical omniscience, for example if one learns additional logical facts after the initial bargaining? It seems like one might not necessarily want to stick with the original coefficients if they were negotiated based on an incomplete understanding of what outcomes are feasible, for example.)
My thoughts:
You do always get a linear combination.
I can’t tell what that combination is, which is odd. The non-smoothness is problematic. You run right up against the constraints—I don’t remember how to deal with this. Can you?
If you have N units of resources which can be devoted to either task A or task B, the ratios of resource used will be the ratio of votes.
I think it depends on what kind of contract you sign. So if I sign a contract that says “we decide according to this utility function” you get something different then a contract that says “We vote yes in these circumstances and no in those circumstances”. The second contract, you can renegotiate, and that can change the utility function.
ETA:
In the case where utility is linear in the set of decisions that go to each side, for any Pareto-optimal allocation that both parties prefer to the starting (random) alllocation, you can construct a set of prices that is consistent with that allocation. So you’re reduced to bargaining, which I guess means Nash arbitration.
I don’t know how to make decisions under logical uncertainty in general. But in our example I suppose you could try to phrase your uncertainty about logical facts you might learn in the future in Bayesian terms, and then factor it into the initial calculation.
These are surely really, really different things. Utilitarianism says to count people more-or-less equally. However, the sort of utility maximization that actually goes on in people’s heads typically results in people valuing their own existence vastly above that of everyone else. That is because they were built that way by evolution—which naturally favours egoism. So, their utility function says: “Me, me, me! I, me, mine!” This is not remotely like utilitarianism—which explains why utilitarians have such a hard time acting on their beliefs—they are wired up by nature to do something totally different.
Also, you probably should not say “instrumental to their own terminal values”. “Instrumental” in this context usually refers to “instrumental values”. Using it to mean something else is likely to mangle the reader’s mind.
So, I think about things like infinite ethics all the time, and it doesn’t seem to disturb me to the extent it does you. You might say, “My brain is set up such that I automatically feel a lot of tension/drama when I feel like I might be ignoring incredibly morally important things.” But it is unclear that this need be the case. I can’t imagine that the resulting strain is useful in the long run. Have you tried jumping up a meta-level, tried to understand and resolve whatever’s causing the strain? I try to think of it as moving in harmony with the Dao.
He is not alone. Consider this, for instance:
Utilitarianism is like a plague around here. Perhaps it is down to the founder effect.
We do in fact want to save worlds we can’t begin to fathom from dangers we can’t begin to fathom even if it makes us depressed or dead… but if you don’t get any satisfaction from saving the world, you might have a problem with selfishness.
That’s not what I meant. What I meant is the general problem you run into when you take this stuff to its extreme. You end up saving hypothetical beings with a very low probability. That means that you might very well save no being at all, if your model was bogus. I am aware that the number of beings saved often outweighs the low probability...but I am not particular confident in this line of reasoning, i.e. in the meta-level of thinking about how to maximize good deeds. That leads to all kind of crazy seeming stuff.
If it does, something almost definitely went wrong. Biases crept in somewhere between the risk assessment, the outside view correction process, the policy-proposing process, the policy-analyzing process, the policy outside view correction process, the ethical injunction check, and the “(anonymously) ask a few smart people whether some part of this is crazy” step. I’m not just adding unnatural steps; each of those should be separate, and each of those is a place where error can throw everything off. Overconfidence plus conjunction fallacy equals crazy seeming stuff. And this coming from the guy who is all about taking ideas seriously.
I don’t feel there is a need for that. You just present these things as tools, not fundamental ideas, also discussing why they are not fundamental and why figuring out fundamental ideas is important. The relevant lesson is along the lines of Fake Utility Functions (the post has “utility function” in it, but it doesn’t seem to need to), applied more broadly to epistemology.
Thinking of Bayesianism as fundamental is what made some people (e.g., at least Eliezer and me) think that fundamental ideas exist and are important. (Does that mean we ought to rethink whether fundamental ideas exist and are important?) From Eliezer’s My Bayesian Enlightenment:
(Besides, even if your suggestion is feasible, somebody would have to rewrite a great deal of Eliezer’s material to not present Bayesianism as fundamental.)
The ideas of Bayesian credence levels and maximum entropy priors are important epistemic tools that in particular allow you to understand that those kludgy AI tools won’t get you what you want.
(It doesn’t matter for the normative judgment, but I guess that’s why you wrote this in parentheses.)
I don’t think Eliezer misused the idea in the sequences, as Bayesian way of thinking is a very important tool that must be mastered to understand many important arguments. And I guess at this point we are arguing about the sense of “fundamental”.
Agreed, but what I’m mostly griping about is when people who know that utility functions are a really inaccurate model still go ahead and use it, even if prefaced by some number of standard caveats. “Goal system”, for example, conveys a similar abstract idea without all of the questionable and misleading technical baggage (let alone associations with “utilitarianism”), and is more amenable to case-specific caveats. I don’t think we should downvote people for talking about utility functions, especially if they’re newcomers, but there’s a point at which we have to adopt generally higher standards for which concepts we give low K complexity in our language.
I have a vested interested in this. All of the most interesting meta-ethics and related decision theory I’ve seen thus far has come from people associated with SingInst or Less Wrong. If we are to continue to be a gathering place for that kind of mind we can’t let our standards degenerate, and ideally we should be aiming for improvement. From far away it would be way easy to dismiss Less Wrong as full of naive nerds completely ignorant of both philosophy and psychology. From up close it would be easy to dismiss Less Wrong as overly confident in a suspiciously homogeneous set of philosophically questionable meta-ethical beliefs, e.g. some form of utilitarianism. The effects of such appearances are hard to calculate and I think larger than most might intuit. (The extent to which well-meaning folk of an ideology very influenced by Kurzweil have poisoned the well for epistemic-hygienic or technical discussion of technological singularity scenarios, for instance, seems both very large and very saddening.)
What is giving this appearance? We have plenty of vocal commenters who are against utilitarianism, top-level posts pointing out problems in utilitarianism, and very few people actually defending utilitarianism. I really don’t get it. (BTW, utilitarianism is usually considered normative ethics, not metaethics.)
Also, utility function != utilitarianism. The fact that some people get confused about this is not a particularly good (additional) reason to stop talking about utility functions.
Here is someone just in this thread who apparently confuses EU-maxing with utilitarianism and apparently thinks that Less Wrong generally advocates utilitarianism. I’ll ask XiXiDu what gave him these impressions, that might tell us something.
ETA: The following comment is outdated. I had a gchat conversation with Wei Dai in which he kindly pointed out some ways in which my intended message could easily and justifiably have interpreted as a much stronger claim. I’ll add a note to my top level comment warning about this.
I never proposed that people stop talking about utility functions, and twice now I’ve described the phenomenon that I’m actually complaining about. Are you trying to address some deeper point you think is implicit in my argument, are you predicting how other people will interpret my argument and arguing against that interpreted version, or what? I may be wrong, but I think it is vitally important for epistemic hygiene that we at least listen to and ideally respond to what others are actually saying. You’re an excellent thinker and seemingly less prone to social biases than most so I am confused by your responses. Am I being dense somehow?
(ETA: The following hypothesis is obviously absurd. Blame it on rationalization. It’s very rare I get to catch myself so explicitly in the act! w00t!) Anyway, the people I have in mind don’t get confused about the difference between reasoning about/with utility functions and being utilitarian, they just take the former as strong evidence as of the latter. This doesn’t happen when “utility function” is used technically or in a sand-boxed way, only when it is used in the specific way that I was objecting to. Notice how I said we should be careful about which concepts we use, not which words.
I don’t really get it either. It seems that standard Less Wrong moral philosophy can be seen at some level of abstraction as a divergence from utilitarianism, e.g. because of apparently widespread consequentialism and focus on decision theory. But yeah, you’d think the many disavowments of utilitarianism would have done more to dispel the notion. Does your impression agree with mine though that it seems that many people think Less Wrong is largely utilitarian?
I desperately want a word that covers the space I want to cover that doesn’t pattern match to incorrect/fuzzy thing. (E.g. I think it is important to remember that one’s standard moral beliefs can have an interesting implicit structure at the ethical/metaethical levels, vice versa, et cetera.) Sometimes I use “shouldness” or “morality” but those are either misleading or awkward depending on context. Are there obvious alternatives I’m missing? I used “moral philosophy” above but I’m pretty sure that’s also straight-up incorrect. Epistemology of morality is clunky and probably means something else.
Why would you want to stop people talking about human utility functions?!? People should not build economic models of humans? How are such things supposedly misleading? You are concerned people will drag in too much from Von Neumann and Morgenstern? What gives?
By contrast, the idea that humans don’t have utility functions seems to be mysterian nonsense. What sense can be made out of that idea?
As I see it, humans have revealed behavioral tendencies and reflected preferences. I share your reservations about “revealed preferences”, which if they differ from both would have to mean something in between. Maybe revealed preferences would be what’s left after reflection to fix means-ends mistakes but not other reflection, if that makes sense. But when is that concept useful? If you’re going to reflect on means-ends, why not reflect all the way?
Also note that the preferences someone reveals through programming them into a transhuman AI may be vastly different from the preferences someone reveals through other sorts of behavior. My impression is that many people who talk about “revealed preferences” probably wouldn’t count the former as authentic revealed preferences, so they’re privileging behavior that isn’t too verbally mediated, or something. I wonder if this attributing revealed preference to a person rather than a person-situation pair should set off fundamental attribution error alarms.
If we have nothing to go by except behavior, it seems like it’s underdetermined whether we should say it’s preferences or beliefs (aliefs) or akrasia that’s being revealed, given that these factors determine behavior jointly and that we’re defining them by their effects. With reflected preferences it seems like you can at least ask the person which one of these factors they identify as having caused their behavior.
Good plausible hypothesis to cache for future priming, but I’m not sure I fully understand it:
More specifically, what process are you envisioning here (or think others might be envisioning)?
We might make something someday that isn’t godshatter, and we need to practice.
I agree that reforming humans to be rational is hopeless, but it is nevertheless useful to imagine how a rational being would deal with things.
But VNM utility is just one particularly unintuitive property of rational agents. (For instance, I would never ever use a utility function to represent the values of an AGI.) Surely we can talk about rational agents in other ways that are not so confusing?
Also, I don’t think VNM utility takes into account things like bounded computational resources, although I could be wrong. Either way, just because something is mathematically proven to exist doesn’t mean that we should have to use it.
Who is sure? If you’re saying that, I hope you are. What do you propose?
I don’t think anybody advocated what you’re arguing against there.
The nearest thing I’m willing to argue for is that one of the following possibilities hold:
We use something that has been mathematically proven to exist, now.
We might be speaking nonsense, depending on whether the concepts we’re using can be mathematically proven to make sense in the future.
Since even irrational agents can be modelled using a utility function, no “reforming” is needed.
How can they be modeled with a utility function?
As explained here:
Thanks for the reference.
It seems though that the reward function might be extremely complicated in general (in fact I suspect that this paper can be used to show that the reward function can be potentially uncomputable).
The whole universe may well be computable—according to the Church–Turing–Deutsch principle. If it isn’t the above analysis may not apply.
I agree with jsteinhardt, thanks for the reference.
I agree that the reward functions will vary in complexity. If you do the usual thing in Solomonoff induction, where the plausibility of a reward function decreases exponentially with its size, so far as I can tell you can infer reward fuctions from behavior, if you can infer behavior.
We need to infer a utility function for somebody if we’re going to help them get what they want, since a utility function is the only reasonable description I know of what an agent wants.
It was my impression that it was LW orthodoxy that at “reflective equilibrium”, the values and preferences of rational humans can be represented by a utility function. That is:
… if we or our AI surrogate ever reach that point, then humans have a utility function that captures what we want morally and hedonistically. Or so I understand it.
Yes, our current god-shatter-derived inconsistent values can not be described by a utility function, even as an abstraction. But it seems to me that most of the time what we are actually talking about is what our values ought to be rather than what they are. So, I don’t think that a utility function is a ridiculous abstraction—particularly for folk who strive to be rational.
Actually, yes they can. Any computable agent’s values can be represented by a utility function. That’s one of the good things about modelling using utility functions—they can represent any agent. For details, see here:
Nope. Humans do have utility functions—in this sense:
Any computable agent has a utility function. That’s the beauty of using a general theory.
A trivial sense, that merely labels what an agent does with 1 and what it doesn’t with 0: the Texas Sharpshooter Utility Function. A “utility function” that can only be calculated—even by the agent itself—in hindsight is not a utility function. The agent is not using it to make choices and no observer can use it to make predictions about the agent.
Curiously, in what appears to be a more recent version of the paper, the TSUF is not included.
Er, the idea is that you can make a utility-maximising model of the agent—using the specified utility function—that does the same things the agent does if you put it in the same environment.
Can people please stop dissing the concept of a human utility function. Correcting these people is getting tedious—and I don’t want to be boring.
Doesn’t work. The Texas Sharpshooter utility function described by Dewar cannot be used to make a utility-maximising model of the agent, except by putting a copy of the actual agent into the box, seeing what it does, declaring that to have utility 1, and doing it. The step of declaring it to have utility 1 plays no role in deciding the actions. It is a uselessly spinning cog doing no more work than a suggestive name on a Lisp symbol.
I was thinking a similar thought about you. You’re the only person here that I’ve seen taking these trivial utility functions seriously.
The idea here is that—if the agent is computable—then it can be simulated by any other computable system. So, if the map between its inputs and state, and its motor output is computable then we can make another computable system which produces the same map—since all universal computing systems can simulate each other by virtue of being Turing complete (and systems made of e.g. partial recursive functions can simulate each other too—if they are given enough memory to do so).
I mentioned computability at the top, by saying: “any computable agent has a utility function”.
As far as anyone can tell, the whole universe is computable.
I don’t see how this bears on the possibility of modelling every agent by a utility-maximising agent. Dewar’s construction doesn’t work. Its simulation of an agent by a utility-maximising agent just uses the agent to simulate itself and attaches the label “utility=1” to its actions.
Dewey says pretty plainly: “any agents can be written in O-maximizer form”.
O-maximisers are just plain old utility maximisers. Dewey rechristens them “Observation-Utility Maximizers” in his reworked paper.
He makes an O-maximiser from an agent, A. Once you have the corresponding O-maximiser, the agent A could be discarded.
I know that he says that. I am saying, I thought pretty plainly, that I disagree with him.
He only does that in the earlier paper. His construction is as I described it: define O as doing whatever A does and label the result with utility 1. A is a part of O and cannot be discarded. He even calls this construction trivial himself, but underrates its triviality.
I don’t really understand which problem you are raising. If the O eventually contains a simulated copy of A—so what? O is still a utililty-maximiser that behaves the same way that A does if placed in the same environment.
The idea of a utility maximiser as used here is that it assigns utilities to all its possible actions and then chooses the action with the highest utility. O does that—so it qualifies as a utililty-maximiser.
O doesn’t assign utilities to its actions and then choose the best. It chooses its action (by simulating A), labels it with utility 1, and chooses to perform the action it just chose. The last two steps are irrelevant.
“Irrelevant”? If it didin’t perform those steps, it wouldn’t be a utility maximiser, and then the proof that you can build a utility maximiser which behaves like any computable agent wouldn’t go through. Those steps are an important part of the reason for exhibiting this construction in the first place.
I think that everyone understands the point you’re trying to make—you can usefully model people as having a utility function in a wide variety of cases—but very often people use such models unskillfully, and it causes people like me to facepalm. If you want to model a lot of humans, for instance, it’s simple and decently accurate to model them as having utility functions. Economics, say. And if you have something like AIXI, or as Dawkins might argue a gene, then a utility function isn’t even a model, it’s right there in front of you.
I hypothesize that the real trouble starts when a person confuses the two; he sees or imagines a Far model of humans with utility functions, zooms in on an individual human or zooms in on himself, and thinks he can see the real utility function sitting right there in front of him, like he could with AIXI. Yeah, he knows in the abstract that he doesn’t have direct access to it, but it feels Near. This can lead to a lot of confusion, and it leads people like me to think folk shouldn’t talk about a person’s “utility function” except in cases where it obviously applies.
Even where you can say “Person A has a utility function that assigns 4 utility to getting cheesecake and 2 utility to getting paperlips”, why not say “Agent A”? But that’s not what I facepalm at. I only facepalm when people say they got their “utility function” from natural selection (i.e. ignoring memes), or say they wish they could modify their utility function, et cetera. In many cases it works as an abstraction, but if you’re not at all thinking about EU, why not talk directly about your preferences/values? It’s simpler and less misleading.
This seems like a bit of a different issue—and one that I am not so interested in.
A couple of comments about your examples, though:
For someone like me it is pretty accurate to say that I got my utility function from natural selection acting on DNA genes. Memes influence me, but I try not to let them influnce my goals. I regard them as symbiotes: mutualists and pathogens. In principle they could do deals with me that might make me change my goals—but currently I have a powerful bargaining position, their bargaining position is typically weak—and so I just get my way. They don’t get to affect my goals. Those that try get rejected by my memetic immune system. I do not want to become the victim of a memetic hijacking.
As for the implied idea that natural selection does not apply to memes, I’ll try to bite my tongue there.
That seems closely equivalent to me. The cases where people talk about utility functions are mostly those where you want to compare with machines, or conjour up the idea of an expected utility maximiser for some reason. Sometimes even having “utility” in the context is enough for the conversation to wander on to utility functions.
My council would be something like: “Don’t like it? Get used to it!” There is not, in fact, anything wrong with it.
That totally wasn’t what I meant to imply. I am definitely a universal Darwinist. (You can view pretty much any optimization process as “evolution”, though, so in some cases it’s questionably useful. Bayesian updating is just like population genetics. But with memes it’s obviously a good description.)
Yes, but I think you’re rather unusual in this regard; most people aren’t so wary of memes. Might I ask why you prefer genes to memes? This seems odd to me. Largely because humans evolved for memes and with memes. Archetypes, for example. But also because the better memes seem to have done a lot of good in the world. (My genetically evolved cognitive algorithms—that is, the algorithms in my brain that I think aren’t the result of culture, but instead are universal machinery—stare in appreciation at the beauty of cathedrals, and are grateful that economies make my life easier.)
’s why I tried to bite my tongue—but it was difficult to completely let it go by...
Well, I love memes, but DNA-genes built 99% of my ancestors unassisted, and are mostly responsible for building me. They apparently equipped me with a memetic immune system, for weeding out undesirable memes, to allow me to defend myself in those cases where there is a conflict of interests.
Why should I side with the memes? They aren’t even related to me. The best of them are beneficial human symbionts—rather like lettuces and strawberries. I care for them some—but don’t exactly embrace their optimisation targets as my own.
I don’t dispute memes have done a lot of good things in the world. So has Mother Teresa—but that doesn’t mean I have to adopt her goals as my own either.
I know what I want based on naive introspection. If you want to have preferences other than those based on naive introspection, then one of your preferences, based on naive introspection, is not to have preferences that are based on naive introspection. I am not sure how you think you could ever get around intuition, can you please elaborate?
Naive introspection is an epistemic process; it’s one kind of algorithm you can run to figure out aspects of the world, in this case your mind. Because it’s an epistemic process we know that there are many, many ways it can be suboptimal. (Cognitive biases come to mind, of course; Robin Hanson writes a lot about how naive introspection and actual reasons are very divergent. But sheer boundedness is also a consideration; we’re just not very good Bayesians.) Thus, when you say “one of your preferences, based on naive introspection, is not to have preferences that are based on naive introspection,” I think:
If my values are what I think they are,
I desire to believe that my values are what I think they are;
If my values aren’t what I think they are,
I desire to believe that my values aren’t what I think they are;
Let me not become attached to values that may not be.
Agree completely. (Even though I am guilty of using the word myself below.) But most of this post seems to be based on linearity of preference, which imho can usually only be justified by muddling around with utilities. So maybe that is the place to start?
EDIT: To clarify, I mean that maybe the reason to reject Person 1′s argument is because it implicitly appeals to notions of utility when claiming you should maximize expected DALYs.
I agree with most of what you say here; is your comment referring to my post and if so which part?
Not referring to your post, no, just some aspects of some of the comments on it and the memetic ecology that enables those aspects. I’ll add a meta tag to my comment to make this clearer.
Because rational agents care about whatever the hell they want to care about. I, personally, choose to care about my abstract ‘utility function’ with the clear implication that said utility function is something that must be messily constructed from godshatter preferences. And that’s ok because it is what I want to want.
No. It is a useful abstraction. Not using utility function measures does not appear to improve abstract decision making processes. I’m going to stick with it.
Eliezer’s original quote was better. Wasn’t it about superintelligences? Anyway you are not a superintelligence or a rational agent and therefore have not yet earned the right to want to want whatever you think you want to want. Then again I don’t have the right to deny rights so whatever.
I wasn’t quoting Eliezer, I made (and stand by) a plain English claim. It does happen to be a similar in form to a recent instance of Eliezer summarily rejecting PhilGoetz declaration that rationalists don’t care about the future. That quote from Eliezer was about “expected-utility-maximising agents” which would make the quote rather inappropriate in the context.
I will actually strengthen my declaration to:
Because agents can care about whatever the hell they want to care about. (This too should be uncontroversial.)
An agent does not determine its preferences by mere vocalisation and nor does its belief about its preference intrinsically make them so. Nevertheless I do care about my utility function (with the vaguely specified caveats). If you could suggest a formalization sufficiently useful for decision making that I could care about it even more than my utility function then I would do so. But you cannot.
No, you don’t. The only way you could apply limits on what I want is via physically altering my molecular makeup. As well as being rather difficult for you to do on any significant scale I could credibly claim that the new physical configuration you constructed from my atoms is other than ‘me’. You can’t get much more of a fundamental destruction of identity than by changing what an agent wants.
I don’t object to you declaring that you don’t have or don’t want to have a utility function. That’s your problem not mine. But I will certainly object to any interventions made that deny that others may have them.
This stands out as problematic, since there’s no plausible consequentialist argument for this from a steel-manned Person 1. Person 1 is both arguing for the total dominance of total utilitarian considerations in Person 2′s decision-making, and separately presenting a bogus argument about what total utilitarianism would recommend. Jennifer’s comment addresses the first prong, while the following paragraphs address the second.
If one finds oneself in a situation where one doesn’t know of any other courses of action within a few orders of magnitude in cost-effectiveness, that’s a sign that one is radically underinformed about the topic and should be learning more before acting. For instance, considerations about large possible future populations/astronomical waste increase the expected value of any existential risk reduction, from asteroids to nukes to bio to AI. For any specific risk there are many different ways, direct and indirect, to try to address it.
Fermi calculations like the one in your post can easily show the strong total utilitarian case (that is, the case within the particular framework of total utilitarianism) for focus on existential risk, but showing that a highly specific “course of action” addressing a specific risk is better than alternative ways to reduce existential risk is not robust to shifts of many orders of magnitude in probability. Even if I am confident in the value of, e.g. asteroid tracking, there are still numerous implementation details that could (collectively) swing cost-effectiveness by a few orders of magnitude, and I can’t avoid that problem in the fashion of your Person 1.
I agree with most of what you say here. Maybe it satisfactorily answers the questions raised in my post; I’ll spend some time brooding over this.
Here it would be good to compile a list; I myself am very much at a loss as to what the available options are.
I have such lists, but by the logic of your post it sounds like you should gather them yourself so you worry less about selection bias.
I would love to study these lists! Would you mind sending me them? ( My email: myusername@gmx.de )
I think there are two things going on here:
more importantly, your utility probably doesn’t scale linearly with DALYs, if for no other reason than that you don’t care very much about things that happen at very low probabilities
less importantly, abstract arguments are much less likely to be correct than they seem at face value. Likelihood of correctness decreases exponentially in both argument length and amount of abstraction, and it is hard for us to appreciate that intuitively.
Thanks.
My life satisfaction certainly does not scale linearly with DALYs (e.g. averting the destruction of 1000 DALYs does not make me ten times as happy as averting the destruction of 100 DALYs). but does seem to be very much influenced by whether I have a sense that I’m “doing the right thing” (whatever that means).
But maybe you mean utility in some other sense than life satisfaction.
If I had the choice of pushing one of 10 buttons, each of which had different distribution of probabilities attached to magnitudes of impact, I think I would push the aggregate utility maximizing one regardless of how small the probabilities were. Would this run against my values? Maybe, I’m not sure.
I agree; I’ve been trying to formulate this intuition in quasi-rigorous terms and have not yet succeeded in doing so.
Well I am talking about the utility defined in the VNM utility theorem, which I assumed is what the term was generally taken to mean on LW, but perhaps I am mistaken. If you mean something else by utility, then I’m unsure why you would “push the aggregate utility maximizing one” as that choice seems a bit arbitrary to me to be a hard and fast rule (except for VNM utility, since VNM utility is by definition the thing whose expected value you maximize).
Would you care to share your intuitions as to why you would push the utility maximizing button, and what you mean by utility in this case (a partial definition / example is fine if you don’t have a precise definition).
Does that apply to AI going FOOM?
To me the claim that human-level AI → superhuman AI in at most a matter of years seems quite likely. It might not happen, but I think the arguments about FOOMing are pretty straightforward, even if not airtight. The specific timeline depends on where on the scale of Moore’s law we are (so if I thought that AI was a large source of existential risk, then I would be trying to develop AGI as quickly as possible, so that the first AGI was slow enough to stop if something bad happened; i.e. waiting longer → computers are faster → FOOM happens on a shorter timescale).
The argument I am far more skeptical of is about the likelihood of an UFAI happening without any warning. While I place some non-negligible probability on UFAI occurring, it seems like right now we know so little about AI that it is hard to judge whether an AI would actually have a significant danger of being unfriendly. By the time we are in any position to build an AGI, it should be much more obvious whether that is a problem or not.
Might you clarify your question?
Depending on what you meant, this might not be relevant, but. Many arguments about AGI and FOOM are antipredictions. “Argument length” as jsteinhardt used it assumes that the argument is a conjunctive one. If an argument is disjunctive then its length implies an increased likelihood of correctness. Eliezer’s “Hard Takeoff” article on OB was pretty long, but the words were used to make an antiprediction.
It is not clear to me that there are well-defined boundaries between what you call a conjunctive and a disjunctive argument. I am also not sure how two opposing predictions are not both antipredictions.
I see that some predictions are more disjunctive than others, i.e. just some of their premises need to be true. But most of the time this seems to be a result of vagueness. It doesn’t necessarily speak in favor of a prediction if it is strongly disjunctive. If you were going to pin it down it would turn out to be conjunctive, requiring all its details to be true.
All predictions are conjunctive:
If you predict that Mary is going to buy one of a thousand products in the supermarket, 1.) if she is hungry 2.) if she is thirsty 3.) if she needs a new coffee machine, then you are seemingly making a disjunctive prediction. But someone else might be less vague and make a conjunctive antiprediction. Mary is not going to buy one of a thousand products in the supermarket because 1.) she needs money 2.) she has to have some needs 3.) the supermarket has to be open. Sure, if the latter prediction was made first then the former would become the antiprediction, which happens to be disjunctive. But being disjunctive does not speak in favor of a prediction in and of itself.
All prediction are antipredictions:
Now you might argue that the first prediction could not be an antiprediction, as it does predict something to happen. But opposing predictions are always predicting the negation of each other. If you predict that Mary is going shopping then you predict that she is not not going shopping.
I’d reverse the importance of those two considerations. Even though my utility doesn’t scale linearly with DALYs, I wish it did.
Why do you wish it did?
My actual utility, I think, does scale with DALY’s, but my hedons don’t. I’d like my hedons to match my utilons so that I can maximize both at the same time (I prefer by definition to maximize utilons if I have to pick, but this requires willpower).
Er I understand that utility != pleasure, but again, why does your utility scale linearly with DALYs? It seems like the sentiments you’ve expressed so far imply that your (ideal) utility function should not favor your own DALYs over someone else’s DALYs, but I don’t see why that implies a linear overall scaling of utility with DALYs.
If I think all DALYs are equally valuable, I should value twice as many twice as much. That’s why I’d prefer it to be linear.
If by value you mean “place utility on” then that doesn’t follow. As I said, utility has to do (among many other things) with risk aversion. You could be willing to pay twice as many dollars for twice as many DALYs and yet not place twice as much utility on twice as many DALYs. Assuming that 1 DALY = 1 utilon, then the utility of x DALYs is by definition 1/p, where p is the probability at which you would pay exactly 1 DALY to get x DALYs with probability p.
Again, having all DALYs be equally valuable doesn’t mean that your utility function scales linearly with DALYs, you could have a utility function that is say sqrt(# DALYs) and this would still value all DALYs equally. Although also see Will_Newsome’s comments elsewhere about why talking about things in terms of utility is probably not the best idea anyways.
If by utility you meant something other than VNM utility, then I apologize for the confusion (although as I pointed out elsewhere, I would then take objection to claims that you should maximize its expected value).
I’m afraid my past few comments have been confused. I don’t know as much about my utility function as I wish I did. I think I am allowed to assign positive utility to a change in my utility function, and if so then I want my utility function to be linear in DALYs. It probably is not so already.
I think we may be talking past each other (or else I’m confused). My question for you is whether you would (or wish you would) sacrifice 1 DALY in order to have a 1 in 10^50 chance of creating 1+10^50 DALYs. And if so, then why?
(If my questions are becoming tedious then feel free to ignore them.)
I don’t trust questions involving numbers that large and/or probabilities that small, but I think so, yes.
Probably good not to trust such number =). But can you share any reasoning or intuition for why the answer is yes?
If one accepts any kind of multiverse theory, even just Level I, then an infinite number of sentient organisms already exist, and it seems that we cannot care about each individual equally without running into serious problems. I previously suggested that we discount each individual using something like the length of its address in the multiverse.
Perhaps a good moment to point out that egoists don’t have to bother with such bizarre weirdness.
And a nihilist doesn’t have to bother with anything...
I would continue—except that I don’t think utilitarians need to bother with such bizarre weirdness either. Instrumental discounting is automatic, and neatly takes care of distant agents.
Provably so?
If, not, there almost certainly exist failure modes.
That is not a very useful argument style. I can’t prove that conservation of energy works throughout the universe—but should not leap from there to “there almost certainly exist failure modes”.
Conservation of energy in large systems can be proved reductively, from the properties of the subsystems.
Similarly, most true facts about decision problems can be proved from a model of what kind of structures can be decision problems.
It then becomes an empirical question whether other kinds of substructures or decision problems exist.
EDIT: Suppose you get in a conversation with a Cunning Philosopher. He comes up with a clever philosophical example designed to expose a flaw in your theory. You point out that the example doesn’t work, there is some problem in it. He comes up with another example, dealing with that problem. You point out that …....
Why should you expect this process to terminate with him running out of ideas?
Now suppose you get in a conversation with the Cunning Perpetual Motion Machine Crank. He comes up with a clever machine designed to violate conservation of energy. You know, because of a proof, that he must be calculating as though one of the parts doesn’t work the way physics said it does. You only need to find this part. There is no way for him to win—except by empirically proving one of the assumptions in the proof invalid.
Good thing we are not discounting individuals by the length of the inferential distance between them and Average Joe.
Do we have any numbers on how many people on LW agree with you, or which people?
I’m fuzzy about the whole thing, but a feature that I think I like about the proposal is that it gives you a nicely-behaved way to deal with the problem of how to value lives lived in extremely complex interpretations of rocks. And if someone lives so far away in space or time that just to locate him requires as much information as it would to specify his whole mind starting from a rock, it’s not obvious to me that he exists in a sense in which the rock-mind does not.
I don’t think there’s anything wrong with valuing people who live in contrived interpretations or rocks, you just can’t interact with them, and whatever it is you observe is usually more of a collection of snapshots than a relevant narrative. Also, destroying the rock only destroys part of your contrived device for observing facts about those people, unless you value the rock itself.
It’s good to see that panpsychism is finally getting the attention it rightfully deserves!
“far away” from what?
If you use your current location as a reference point than the theory becomes non-updateless and incoherent and falls apart. You don’t “get” any starting point when you try to locate someone.
I think the universe implicitly defines a reference point in the physics. By way of illustration, I think Tegmark sometimes talks about an inflation scenario where an actually infinite space is the same as a finite bubble that expands from a definite point, but with different coordinates that mix up space and time; and in that case I think that definite point would be algorithmically privileged. But I’m even fuzzier on all this than before.
I think the focus on a physical reference point here seems misguided. Perhaps more conceptually well-founded would be something like a search for a logical reference point, using your existence in some form at some level of abstraction and your reasoning about that logical reference point both as research of and as evidence about attractors in agentspace, via typical acausal means.
Vladimir Nesov’s decision theory mailing list comments on the role of observational uncertainty in ambient-like decision theories seems relevant. Not to imply he wouldn’t think what I’m saying here is complete nonsense.
In one of my imaginable-ideal-barely-possible worlds, Eliezer’s current choice of “thing to point your seed AI at and say ‘that’s where you’ll find morality content’” was tentatively determined to be what it currently nominally is (instead of tempting alternatives like “the thing that makes you think that your proposed initial dynamic is the best one” or “the thing that causes you to care about doing things like perfecting things like the choice of initial dynamic” or something) after he did a year straight of meditation on something like the lines of reasoning I suggest above, except honed to something like perfection-given-boundedness (e.g. something like the best you could reasonably expect to get at poker given that most of your energy has to be put into retaining your top 5 FIDE chess rating while writing a bestselling book popular science book).
I think it depends on the physics. Some have privileged points, some don’t.
But surely given any scheme to assign addresses in an infinite universe, for every L there’s a finite bubble of the universe outside of which all addresses are at least L in length?
If a universe is tiled with a repeating pattern then you can assign addresses to parts of the pattern, each an infinite number of points.
I don’t know how this applies to other universes.
If hypothetically our universe had a privileged point, what would you do if you discovered you were much farther away from it than average?
Naively, you wouldn’t use some physical location, but instead logical descriptions in the space of algorithms given axioms you predict others will predict are Schelling points (using your own (your past) architecture/reasoning as evidence of course).
Naively, this is a question of ethics and not game theory, so I don’t see why Schelling points should enter into it.
I thought “Schelling point” was used by the decision theory workshop folk, I may be wrong. Anyway, decision theory shares many aspects of cooperative game theory as pointed out by Wei Dai long ago, and many questions of ethics must be determined/resolved/explored by such (acausal) cooperation/control.
Relevance? (That people in group Y use a word doesn’t obviously clarify why you used it.)
I mistakenly thought that Will Sawin was in said group and was thus expressing confusion that he wasn’t already familiar with its broader not-quite-game-theoretic usage, or at least what I perceived to be a broader usage. Our interaction is a lot more easily interpreted in that light.
(I didn’t understand what you meant either when I wrote that comment, now I see the intuition, but not a more technical referent.)
And if you meant that you don’t see a more technical referent for my use of Schelling point then there almost certainly isn’t one, and thus it could be claimed that I was sneaking in technical connotations with my naive intuitions. Honestly I thought I was referring to a standard term or at least concept, though.
The term is standard, it was unclear how it applies, the intuition I referred to is about how it applies.
Can you explain that intuition to me or point me to a place where it is explained or something?
Or, alternately, tell me that the intuition is not important?
Two agents in a PD can find a reason to cooperate in proving (deciding) that their decision algorithms are equivalent to some third algorithm that is the same for both agents (in which case they can see that their decision is the same, and so (C,C) is better than (D,D)). This common algorithm could be seen as a kind of focal point that both agents want to arrive at.
I don’t think it matters much, but the specific agents I had in mind were perhaps two subagents/subalgorithms (contingent instantiations? non-Platonic instantiations?) both “derived” (logically/acausally) from some class of variably probable unknown-to-them but less-contingent creator agents/algorithms (and the subagents have a decision theory that ‘cares’ about creator/creation symmetry or summat, e.g., causally speaking, there should be no arbitrary discontinuous decision policy timestamping). There may be multiple possible focal points and it may be tricky to correctly treat the logical uncertainty.
All of that to imply that the focus shouldn’t be determining some focal point for the universe, if that means anything, but focal points in algorithmspace, which is probably way more important.
Ah, I see.
(I, on the other hand, don’t.)
You’ve talked about similar things yourself in the context of game semantics / abstract interpretation / time-symmetric perceptions/actions. I’d be interested in Skype convo-ing with you now that I have an iPhone and thus a microphone. I’m very interested in what you’re working on, especially given recent events. Your emphasis on semantics has always struck me as well-founded. I have done a fair amount of speculation about how an AI (a Goedel machine, say) crossing the ‘self-understanding’/‘self-improving’/Turing-universal/general-intelligence/semantic boundary would transition from syntactic symbol manipulator to semantic goal optimizer and what that would imply about how it it would interpret the ‘actual’ semantics of the Lisp tokens that the humans would identify as its ‘utility function’. If you don’t think about that much then I’d like to convince you that you should, considering that it is on the verge of technicality and also potentially very important for Shulman-esque singularity game theory.
The idea is that having exactly the same or similar algorithms to agents is enormously good, due to a proliferation of true PDs, and that therefore even non-game-theoretic parts of algorithms should be designed, whenever possible, to mimic other agents.
However applying this argument to utility functions seems a bit over-the-top. Considering that whether or not something is a PD depends on your utility function, altering the utility function to win at PDs should be counter-productive. If that makes sense, we need better decision theories.
The intuition that “Schelling points” are an at all reasonable or non-bastardized way of thinking about this, or the intuition behind the “this” I just mentioned? If the latter, I did preface it with “naively”, and I fully disclaim that I do not have a grasp of the technical aspects, just aesthetics which are hard to justify or falsify, and the only information I pass on that might be of practical utility to folk like you or Sawin will be ideas haphazardly stolen from others and subsequently half-garbled. If you weren’t looking closely, you wouldn’t see anything, and you have little reason to look at all. Unfortunately there is no way for me to disclaim that generally.
link? explanation? something of that nature?
EDIT: Private message sent instead of comment reply.
I intuit that the difference between logical and observational uncertainty could be relevant in non-obvious ways. Anyway, this sort of thinking seems obviously correct, but I fear the comparison may mislead some, considering that inferring the numbers and preferences of minds in causally disconnected parts of the multiverse through sheer logical reasoning is probably way way way easier than interpreting the ‘strength’/‘existence’ and preferences of minds in rocks, at least as I consider it. (I worded that so poorly that it’s incoherent as explicitly stated but I think the message is intact.)
Replies to questions:
Yes.
Yes.
The problem arises only if one assumes that “a model not obviously wrong” shouldn’t have a probability below some threshold, which is independent of the model. Hence to reconcile the things one should drop this assumption. Alternatively, one may question the “not obviously wrong” part.
Remarks:
Existence of a threshold p0 of minimal probability of any statement being true is clearly inconsistent, since for any value of p0 there are more than 1/p0 incompatible statements. Therefore some qualifier as “not obviously wrong” is added as a requirement for the statement. But our detection of obvious wrongness is far from reliable. It is far easier to generate an unlikely good sounding theory than to justify assigning it the low probability it deserves (the former requires making up one theory, the latter entails creating all incompatible theories from the same plausibility class). I believe that most people who accept the example argument would still find more than 1/p0 incompatible statements “not obviously wrong”. That makes such people susceptible to being Pascal-wagered.
I reject the example argument because I don’t care a bit about simulated universes, and I even don’t feel a need to pretend that I do (even on LW). But I could be, in principle, Pascal-wagered by some other similar argument. For example, I would care about hell, if it existed.
It seems to me that the only reliable defense against Pascal wagers is to have either bounded utility or to let the utility influence the probability estimates. Bounded utility sounds less weird.
Disability-adjusted life year.
The easiest answer is that nobody is seriously anything even remotely approaching utilitarian. Try writing down your utility function in even some very limited domain, and you’ll see that yourself.
Utilitarianism is a mathematical model that has very convenient mathematical properties, and has enough numbers to tweak available that you can use it to analyze some very simple situations (see the entire discipline of economics). It breaks very quickly when you push it a little.
And seriously, exercise of writing down point system of what is worth how many utility points to you is really eye-opening, I wrote a post on lesswrong about it ages ago if you’re interested.
Here’s a link to Dawrst’s main page. I find this article on vegetarianism to be particularly interesting—though perhaps in a different way than Dawrst intended—and it’s perhaps one of few ‘traditional’ utilitarian arguments that has contributed to me changing how I thought about day-to-day decisions. I haven’t re-evaluated that article since I read it 6 months ago though.
It seems that if you accept this, you really ought to go accept Pascal’s Wager as well, since a lot of smart people believe in God.
It seems like an extraordinary leap to accept that the original numbers are within 5 orders of magnitude, unless you’ve actually been presented with strong evidence. Humans naturally suck at estimating these sorts of things absent evidence (see again, mass belief in One True God), so there’s no a-priori reason to suspect it’s even within 10^5.
Here is an example of Counterargument #3.
Upon further thought, the real reason that I reject Person 1′s argument is because everything should add up to normality, whereas Person 1′s conclusion is ridiculous at face value, and not in a “that seems like a paradox” way, more of a “who is this lunatic talking to me” way.
As I understand it, the scenario is that you’re hearing a complicated argument, and you don’t fully grok or internalize it. As advised by “Making Your Explicit Reasoning Trustworthy”, you have decided not to believe it fully.
The problem comes in the second argument—should you take the advice of the person (or meme) that you at least somewhat mistrust in “correcting” for your mistrust? As you point out, if the person (or the meme) is self-serving, then the original proposal and the correction procedure will fit together neatly to cause a successful mugging.
I think concerns like this suggest that we ought to be using some sort of robust statistics—that is, the direction of the argument and the fact that the conclusion is extreme should influence our conclusions, but the magnitude cannot be allowed to influence our conclusions.
The person need not even be self-serving. All people respond to incentives, and since publishing popular results is rewarding (in fame; often financially as well) the creators of novel arguments will become more likely to believe those arguments.
The link to Anna’s post in the footnotes is broken. Should be here.
Thanks, fixed.
I think so, for side reasons I go into in another comment reply: basically, in a situation with a ton of uncertainty and some evidence for the existence of a class of currently unknown but potentially extremely important things, one should “go meta” and put effort/resources into finding out how to track down such things, reason about such things, and reason about the known unknowns and unknown knowns of the class of such things. There are a variety of ways to go meta here that it would be absurd to fail to seriously consider.
I feel like there’s a way to LCPW your question, but I can’t off the top of my head think of an analogue satisfactorily close to the heart of your question that still seems meaningfully connected to the real world in the right ways. Anyone have suggestions, even if they’re probably bad?
My default expectation is for models like the one presented to be off by many orders of magnitude even when handed to me in well-written posts by people whose thinking I esteem in a context where perceived irrationality is received very harshly. If someone was speaking passable Bayesian at me and said there was a non-negligible chance that some strategy was significantly better than my current one, then I would Google the relevant aspects and find out for myself. Arguments like those advanced by Person 1 are indicators of things to look at. If the model uncertainty is so great, then put effort into lessening your model uncertainty.
Sometimes this is of course very hard for one to do given a few hours and an internet access, and the lab universe is a good example of this. (In the lab universe case, some minutes should perhaps be spent on politely asking various cosmologists how they’d bet on the creation of a lab universe occurring at some future points, perhaps even bringing up AGI in this context.) But if so then surely there are other more effective and more psychologically realistic courses of action than just treating your model uncertainty as a given and then optimizing from ignorance.
I’d like to LCPW question three as well but I remain rather unconvinced that we have to deal with model uncertainty in this way. Even if there aren’t easy ways to better one’s map or some group’s map, which I doubt, I would sooner read many textbooks on cognitive biases, probability theory, maybe complex systems, and anything I could find about the risk/strategy in question, than engage in the policy of acting with such little reason, especially when there are personal/psychological/social costs.
I haven’t even finished reading this post yet, but it’s worth making explicit (because of the obvious connections to existential risks strategies in general) that the philanthropy in this case should arguably go towards research that searches for and identifies things like lab universe scenarios, research into how to search for or research such things (e.g. policies about dealing with basilisks at the individual and group levels), research into how to structure brains such that those brains won’t completely fail at said research or research generally, et cetera ad infinitum. Can someone please start a non-profit dedicated to the research and publication of “going meta”? Please?
ETA: I’m happy to see you talk about similar things with counterargument 3, but perhaps you could fuse an FAI (not necessarily CEV) argument with the considerations I mentioned above, e.g. put all of your money into building a passable oracle AI to help you think about how to be an optimal utilitarian (perhaps given some amount of information about what you think “your” “utility function(s)” might be (or what you think morality is)), or something more meta than that.
Research into bootstrapping current research to ideal research, research into cognitive comparative advantage, research into how to convince people to research such things or support the research of such things, research into what to do given that practically no one can research any of these things and even if they could no one would pay them to...