… but it discards all concerns outside of that. “If I regret my planet’s death then I regret it, and it’s beneath my dignity to pretend otherwise” does not imply that there might not be other values you could achieve during the time available.
Another way to put that, perhaps, is that “knowing we did everything we could” doesn’t seem particularly dignified. Not if you had no meaningful expectation it could work. Extracting whatever other, potentially completely unrelated, value you could from the remaining available time would seem a lot more dignified to me than continuing on something you truly think is futile.
Extracting whatever other, potentially completely unrelated, value you could from the remaining available time would seem a lot more dignified to me than continuing on something you truly think is futile.
The amount of EV at stake in my (and others’) experiences over the next few years/decades is just too small compared to the EV at stake in the long-term future. The “let’s just give up” intuition makes sense in a regime where we’re comparing stakes that differ by 10x, or 100x, or 1000x; I think its appeal in this case comes from the fact that it’s hard to emotionally appreciate how much larger the quantities we’re talking about are.
(But the stakes don’t change just because it’s subjectively harder to appreciate them; and there’s nothing dignified about giving up prematurely because of an arithmetic error.)
I think that utilitarianism and actual human values are in different galaxies (example of more realistic model). There’s no way I would sacrifice a truly big chunk of present value (e.g. submit myself to a year of torture) to increase the probability of a good future by something astronomically small. Given Yudkowsky’s apparent beliefs about success probabilities, I might have given up on alignment research altogether[1].
On that other hand, I don’t inside-view!think the success probability is quite that small, and also the reasoning error that leads to endorsing utilitarianism seems positively correlated with the reasoning error that leads to extremely low success probability. Because, if you endorse utilitarianism then it generates a lot of confusion about the theory of rational agents, which makes you think there are more unsolved questions than there really are[2]. In addition, value learning seems more hopeless than it actually is.
I have some reservations about posting this kind of comments, because they might be contributing to shattering the shared illusion of utilitarianism and its associated ethos, an ethos whose aesthetics I like and which seems to motivate people to do good things. (Not to mention, these comments might cause people to think less of me.) On the other hand, the OP says we need to live inside reality and be honest with ourselves and with each other, and I agree with that.
I think that utilitarianism and actual human values are in different galaxies (example of more realistic model). There’s no way I would sacrifice a truly big chunk of present value (e.g. submit myself to a year of torture) to increase the probability of a good future by something astronomically small.
I think that I’d easily accept a year of torture in order to produce ten planets worth of thriving civilizations. (Or, if I lack the resolve to follow through on a sacrifice like that, I still think I’d have the resolve to take a pill that causes me to have this resolve.)
‘Produce ten planets worth of thriving civilizations, with certainty’ feels a lot more tempting to me than ‘produce an entire universe of thriving civilizations, with tiny probability’, but I think that’s because I have a hard time imagining large quantities and because of irrational, non-endorsed attachment to certainties, not because of a deep divergence between my values and utilitarianism.
I do think my values are very non-utilitarian in tons of ways. A utilitarian would have zero preference for their own well-being over anyone else’s, would care just as much for strangers as for friends and partners, etc. This obviously doesn’t describe my values.
But the cases where this reflectively endorsed non-utilitarianism shows up, are ones where I’m comparing, e.g., one family member’s happiness versus a stranger’s happiness, or the happiness of a few strangers. I don’t similarly feel that a family member of mine ought to matter more than an intergalactic network of civilizations. (And to the extent I do feel that way, I certainly don’t endorse it!)
On the other hand, the OP says we need to live inside reality and be honest with ourselves and with each other, and I agree with that.
Yes, if utilitarianism is wrong in the particular ways you think it is (which I gather is a strict superset of the ways I think it is?), then I want to believe as much. I very much endorse you sharing arguments to that effect! :)
[edit: looks like Rob posted elsethread a comment addressing my question here]
I’m a bit confused by this argument, because I thought MIRI-folk had been arguing against this specific type of logic. (I might be conflating a few different types of arguments, or might be conflating ‘well, Eliezer said this, so Rob automatically endorses it’, or some such).
But, I recall recommendations to generally not try to get your expected value from multiplying tiny probabilities against big values, because a) in practice that tends to lead to cognitive errors, b) in many cases people were saying things like “x-risk is a small probability of a Very Bad Outcome”, when the actual argument was “x-risk is a big probability of a Very Bad Outcome.”
(Right now maybe you’re maybe making a different argument, not about what humans should do, but about some underlying principles that would be true if we were better at thinking about things?)
[...] And finally, I once again state that I abjure, refute, and disclaim all forms of Pascalian reasoning and multiplying tiny probabilities by large impacts when it comes to existential risk. We live on a planet with upcoming prospects of, among other things, human intelligence enhancement, molecular nanotechnology, sufficiently advanced biotechnology, brain-computer interfaces, and of course Artificial Intelligence in several guises. If something has only a tiny chance of impacting the fate of the world, there should be something with a larger probability of an equally huge impact to worry about instead. You cannot justifiably trade off tiny probabilities of x-risk improvement against efforts that do not effectuate a happy intergalactic civilization, but there is nonetheless no need to go on tracking tiny probabilities when you’d expect there to be medium-sized probabilities of x-risk reduction.
[...] EDIT: To clarify, “Don’t multiply tiny probabilities by large impacts” is something that I apply to large-scale projects and lines of historical probability. On a very large scale, if you think FAI stands a serious chance of saving the world, then humanity should dump a bunch of effort into it, and if nobody’s dumping effort into it then you should dump more effort than currently into it. On a smaller scale, to compare two x-risk mitigation projects in demand of money, you need to estimate something about marginal impacts of the next added effort (where the common currency of utilons should probably not be lives saved, but “probability of an ok outcome”, i.e., the probability of ending up with a happy intergalactic civilization). In this case the average marginal added dollar can only account for a very tiny slice of probability, but this is not Pascal’s Wager. Large efforts with a success-or-failure criterion are rightly, justly, and unavoidably going to end up with small marginally increased probabilities of success per added small unit of effort. It would only be Pascal’s Wager if the whole route-to-an-OK-outcome were assigned a tiny probability, and then a large payoff used to shut down further discussion of whether the next unit of effort should go there or to a different x-risk.
From my perspective, the name of the game is ‘make the universe as a whole awesome’. Within that game, it would be silly to focus on unlikely fringe x-risks when there are high-probability x-risks to worry about; and it would be silly to focus on intervention ideas that have a one-in-a-million chance of vastly improving the future, when there are other interventions that have a one-in-a-thousand chance of vastly improving the future, for example.
That’s all in the context of debates between longtermist strategies and candidate megaprojects, which is what I usually assume is the discussion context. You could have a separate question that’s like ‘maybe I should give up on ~all the value in the universe and have a few years of fun playing sudoku and watching Netflix shows before AI kills me’.
In that context, the basic logic of anti-Pascalian reasoning still applies (easy existence proof: if working hard on x-risk raised humanity’s odds of survival from 1/1010100 to 5/1010100, it would obviously not be worth working hard on x-risk), but I don’t think we’re anywhere near the levels of P(doom) that would warrant giving up on the future.
‘There’s no need to work on supervolcano-destroying-humanity risk when there are much more plausible risks like bioterrorism-destroying-humanity to worry about’ is a very different sort of mental move than ‘welp, humanity’s odds of surviving are merely 1-in-100, I guess the reasonable utility-maximizing thing to do now is to play sudoku and binge Netflix for a few years and then die’. 1-in-100 is a fake number I pulled out of a hat, but it’s an example of a very dire number that’s obviously way too high to justify humanity giving up on its future.
(This is all orthogonal to questions of motivation. Maybe, in order to avoid burning out, you need to take more vacation days while working on a dire-looking project, compared to the number of vacation days you’d need while working on an optimistic-looking project. That’s all still within the framework of ‘trying to do longtermist stuff’, while working with a human brain.)
One additional thing adding confusion is Nate Soares’ latest threads on wallowing* which… I think are probably compatible with all this, but I couldn’t pass the ITT on.
*I think his use of ‘wallowing’ is fairly nonstandard, you shouldn’t read into it until you’ve talked to him about it for at least an hour.
I think that I’d easily accept a year of torture in order to produce ten planets worth of thriving civilizations. (Or, if I lack the resolve to follow through on a sacrifice like that, I still think I’d have the resolve to take a pill that causes me to have this resolve.)
I think that “resolve” is often a lie we tell ourselves to explain the discrepancies between stated and revealed preferences. I concede that if you took that pill, it would be evidence against my position (but, I believe you probably would not).
A nuance to keep in mind is that reciprocity can be a rational motivation to behave more altruistically that you otherwise would. This can come about from tit-for-tat / reputation systems, or even from some kind of acausal cooperation. Reciprocity is effectively moving us closer to utilitarianism, but certainly not all the way there.
So, if I’m weighing the life of my son or daughter against an intergalatic network of civilizations, which I never heard of before and never going to hear about after, and which wouldn’t even reciprocate in a symmetric scenario, I’m choosing my child for sure.
If I knew as a certainty that I cannot do nearly as much good some other way, and I was certain that taking the pill causes that much good, I’d take the pill, even if I die after the torture and no one will know I sacrificed myself for others.
I admit those are quite unusual values for a human, and I’m not arguing about that it would be rational because of utilitarianism or so, just that I would do it. (Possible that I’m wrong, but I think very likely I’m not.) Also, I see that the way my brain is wired outer optimization pushes against that policy, and I think I probably wouldn’t be able to take the pill a second time under the same conditions (given that I don’t die after torture), or at least not often.
I don’t think those are unusual values for a human. Many humans have sacrificed their lives (and endured great hardship and pain, etc.) to help others. And many more would take a pill to gain that quality, seeing it as a more courageous and high-integrity expression of their values.
I think that I’d easily accept a year of torture in order to produce ten planets worth of thriving civilizations. (Or, if I lack the resolve to follow through on a sacrifice like that, I still think I’d have the resolve to take a pill that causes me to have this resolve.)
I’d do this to save ten planets of worth of thriving civilizations, but doing it to produce ten planets worth of thriving civilizations seems unreasonable to me. Nobody is harmed by preventing their birth, and I have very little confidence either way as to whether their existence will wind up increasing the average utility of all lives ever eventually lived.
I used to favour average utilitarianism too, until I learned about the sadistic conclusion. That was sufficient to overcome any aversion I had to the repugnant conclusion.
I’m happy to accept the sadistic conclusion as normally stated, and in general I find “what would I prefer if I were behind the Rawlsian Veil and going to be assigned at random to one of the lives ever actually lived” an extremely compelling intuition pump. (Though there are other edge cases that I feel weirder about, e.g. is a universe where everyone has very negative utility really improved by adding lots of new people of only somewhat negative utility?)
As a practical matter though I’m most concerned that total utilitarianism could (not just theoretically but actually, with decisions that might be locked-in in our lifetimes) turn a “good” post-singularity future into Malthusian near-hell where everyone is significantly worse off than I am now, whereas the sadistic conclusion and other contrived counterintuitive edge cases are unlikely to resemble decisions humanity or an AGI we create will actually face. Preventing the lock-in of total utilitarian values therefore seems only a little less important to me than preventing extinction.
Another question. Imagine a universe with either only 5 or 10 people. If they’re all being tortured equally badly at a level of −100 utility, are you sure you’re indifferent as to the number of people existing? Isn’t less better here?
Yeah that’s essentially the example I mentioned that seems weirder to me, but I’m not sure, and at any rate it seems much further from the sorts of decisions I actually expect humanity to have to make than the need to avoid Malthusian futures.
… but to be able to say that utilitarianism in all its forms was “wrong” would require an external standard.
Utiltiarianism as a meta ethical theory can be wrong without being ethically wrong. Meta ethical theories can be criticised for being contradictory, unworkable, irrelevant, etc.
… but to be able to say that utilitarianism in all its forms was “wrong” would require an external standard. Ethical realism really is wrong.
Utilitarianism can be wrong as a description of actual human values, or of ‘the values humans would self-modify to if they fully understood the consequences of various self-modification paths’.
OK, but that’s an is-ought issue. I didn’t perceive the question as being about factual human values, but about what people should do. It’s an ethical system, after all, not a scientific system.
Your definition is wrong; I think that way of defining ‘utilitarianism’ is purely an invention of a few LWers who didn’t understand what the term meant and got it mixed up with ‘utility function’. AFAIK, there’s no field where ‘utilitarianism’ has ever been used to mean ‘having a utility function’.
Hm, I worry I might be a confused LWer. I definitely agree that “having a utility function” and “being a utilitarian” are not identical concepts, but they’re highly related, no? Would you agree that, to a first-approximation, being a utilitarian means having a utility function with the evolutionary godshatter as terminal values? Even this is not identical to the original philosophical meaning I suppose, but it seems highly similar, and it is what I thought people around here meant.
Would you agree that, to a first-approximation, being a utilitarian means having a utility function with the evolutionary godshatter as terminal values?
This is not even close to correct, I’m afraid. In fact being a utilitarian has nothing whatever to do with the concept of a utility function. (Nor—separately—does it have much to do with “evolutionary godshatter” as values; I am not sure where you got this idea!)
Please read this page for some more info presented in a systematic way.
I meant to convey a utility function with certain human values as terminal values, such as pleasure, freedom, beauty, etc.; godshatter was a stand-in.
If the idea of a utility function has literally nothing to do with moral utilitarianism, even around here, I would question why in the above when Eliezer is discussing moral questions he references expected utility calculations? I would also point to “intuitions behind utilitarianism“ as pointing at connections between the two? Or “shut up and multiply”? Need I go on?
I know classical utilitarianism is not exactly the same, but even in what you linked, it talks about maximizing the total sum of human happiness and sacrificing some goods for others, measured under a single metric “utility”. That sounds an awful lot like a utility function trading off human terminal values? I don’t see how what I’m pointing at isn’t just a straightforward idealization of classical utilitarianism.
I meant to convey a utility function with certain human values as terminal values, such as pleasure, freedom, beauty, etc.; godshatter was a stand-in.
Yes, I understood your meaning. My response stands.
If the idea of a utility function has literally nothing to do with moral utilitarianism, even around here, I would question why in the above when Eliezer is discussing moral questions he references expected utility calculations?
What is the connection? Expected utility calculations can be, and are, relevant to all sorts of things, without being identical to, or similar to, or inherently connected with, etc., utilitarianism.
I would also point to “intuitions behind utilitarianism“ as pointing at connections between the two? Or “shut up and multiply”? Need I go on?
The linked post makes some subtle points, as well as some subtle mistakes (or, perhaps, instances of unclear writing on Eliezer’s part; it’s hard to tell).
I know classical utilitarianism is not exactly the same, but even in what you linked, it talks about maximizing the total sum of human happiness and sacrificing some goods for others, measured under a single metric “utility”. That sounds an awful lot like a utility function trading off human terminal values? I don’t see how what I’m pointing at isn’t just a straightforward idealization of classical utilitarianism.
The “utility” of utilitarianism and the “utility” of expected utility theory are two very different concepts that, quite unfortunately and confusingly, share a term. This is a terminological conflation, in other words.
None of what you have linked so far has particularly conveyed any new information to me, so I think I just flatly disagree with you. As that link says, the “utility” in utilitarianism just means some metric or metrics of “good”. People disagree about what exactly should go into “good” here, but godshatter refers to all the terminal values humans have, so that seems like a perfectly fine candidate for what the “utility” in utilitarianism ought to be. The classic “higher pleasures” in utilitarianism lends credence toward this fitting into the classical framework; it is not a new idea that utilitarianism can include multiple terminal values with relative weighting.
Under utilitarianism, we are then supposed to maximize this utility. That is, maximize the satisfaction of the various terminal goals we are taking as good, aggregated into a single metric. And separately, there happens to be this elegant idea called “utility theory”, which tells us that if you have various preferences you are trying to maximize, there is a uniquely rational way to do that, which involves giving them relative weights and aggregating into a single metric… You seriously think there’s no connection here? I honestly thought all this was obvious.
In that last link, they say “Now, it is sometimes claimed that one may use decision-theoretic utility as one possible implementation of the utilitarian’s ‘utility’” then go on to say why this is wrong, but I don’t find it to be a knockdown argument; that is basically what I believe and I think I stand by it. Like, if you plug “aggregate human well-being along all relevant dimensions” into the utility of utility theory, I don’t see how you don’t get exactly utilitarianism out of that, or at least one version of it?
EDIT: Please also see in the above post under “You should never try to reason using expected utilities again. It is an art not meant for you. Stick to intuitive feelings henceforth.” It seems to me that Eliezer goes on to consistently use the “expected utilities” of utility theory to be synonymous to the “utilities” of utilitarianism and the “consequences” of consequentialism. Do you agree that he’s doing this? If so, I assume you think he’s wrong for doing it? Eliezer tends to call himself a utilitarian. Do you agree that he is one, or is he something else? What would you call “using expected utility theory to make moral decisions, taking the terminal value to be human well-being”?
In that last link, they say “Now, it is sometimes claimed that one may use decision-theoretic utility as one possible implementation of the utilitarian’s ‘utility’” then go on to say why this is wrong, but I don’t find it to be a knockdown argument; that is basically what I believe and I think I stand by it. Like, if you plug “aggregate human well-being along all relevant dimensions” into the utility of utility theory, I don’t see how you don’t get exactly utilitarianism out of that, or at least one version of it?
You don’t get utilitarianism out of it because, as explained at the link, VNM utility is incomparable between agents (and therefore cannot be aggregated across agents). There are no versions of utilitarianism that can be constructed out of decision-theoretic utility. This is an inseparable part of the VNM formalism.
That having been said, even if it were possible to use VNM utility as the “utility” of utilitarianism (again, it is definitely not!), that still wouldn’t make them the same theory, or necessarily connected, or conceptually identical, or conceptually related, etc. Decision-theoretic expected utility theory isn’t a moral theory at all.
Really, this is all explained in the linked post…
Re: the “EDIT:” part:
It seems to me that Eliezer goes on to consistently use the “expected utilities” of utility theory to be synonymous to the “utilities” of utilitarianism and the “consequences” of consequentialism. Do you agree that he’s doing this?
No, I do not agree that he’s doing this.
Eliezer tends to call himself a utilitarian. Do you agree that he is one, or is he something else?
What would you call “using expected utility theory to make moral decisions, taking the terminal value to be human well-being”?
I would call that “being confused”.
How to (coherently, accurately, etc.) map “human well-being” (whatever that is) to any usable scalar (not vector!) “utility” which you can then maximize the expectation of, is probably the biggest challenge and obstacle to any attempt at formulating a moral theory around the intuition you describe. (“Utilitarianism using VNM utility” is a classic failed and provably unworkable attempt at doing this.)
If you don’t have any way of doing this, you don’t have a moral theory—you have nothing.
If the idea of a utility function has literally nothing to do with moral utilitarianism, even around here, I would question why in the above when Eliezer is discussing moral questions he references expected utility calculations
If he has a proof that utilitarianism, as usually defined the highly altruistic ethical theory, is equivalent to maximization of an arbitrary UF , given some considerations about coherence, then he has something extraordinary that should be widely I own.
Or he is using “utilitarianism” in a weird way. ..or he is not and he is just confused.
I said nothing about an arbitrary utility function (nor proof for that matter). I was saying that applying utility theory to a specific set of terminal values seems to basically get you an idealized version of utilitarianism, which is what I thought the standard moral theory was around here.
If you know the utility function that is objectively correct, then you have the correct metaethics, and VnM style utility maximisation only tells you how implement it efficiently.
The first thing is “utilitarianism is true”, the second thing is “rationality is useful”.
But that goes back to the issue everyone criticises: EY recommends an object level decision...prefer torture to dust specks… unconditionally without knowing the reader’s UF.
If he had succeeded in arguing, or even tried to tried to argue that there is one true objective UF, then he would be in a position to hand out unconditional advice.
Or if he could show that preferring torture to dust specks was rational given an arbitrary UF, then he could also hand out unconditional advice (in the sense that the conditioning on an subjective UF doesn’t make a difference,). But he doesn’t do that, because if someone has a UF that places negative infinity utility on torture, that’s not up for grabs… their personal UF is what it is .
Because, if you endorse utilitarianism then it generates a lot of confusion about the theory of rational agents, which makes you think there are more unsolved questions than there really are[2].
Are you alluding to agents with VNM utility functions here?
I assign much lower value than a lot of people here to some vast expansionist future… and I suspect that even if I’m in the minority, I’m not the only one.
Can you be more explicit about the arithmetic? Would increasing the probability of civilization existing 1000 years from now from 10^{-7} to 10^{-6} be worth more or less to you than receiving a billion dollars right now?
Do I get any information about what kind of civilization I’m getting, and/or about what it would be doing during the 1000 years or after the 1000 years?
On edit: Removed the “by how much” because I figured out how to read the notation that gave the answer.
I guess by “civilization” I meant “civilization whose main story is still being meaningfully controlled by humans who are individually similar to modern humans”. Other than that, I just mean your current expectations about what that civilization is like, conditioned on it existing.
(It seems like you could be disagreeing with “a lot of people here” about what those futures look like or how valuable they are or both—I’d be happy to get clarification on either front.)
Hmm. I should have asked what the alternative to civilization was going to be.
Nailing it down to a very specific question, suppose my alternatives are...
I get a billion dollars. My life goes on as normal otherwise. Civilization does whatever it’s going to do; I’m not told what. Omega tells me that everybody will suddenly drop dead at some time within 1000 years, for reasons I don’t get to know, with probability one minus one in ten million.
… versus...
I do not get a billion dollars. My life goes on as normal otherwise. Civilization does whatever it’s going to do; I’m not told what. Omega tells me that everybody will suddenly drops dead at some time within 1000 years, for reasons I don’t get to know, with probability one minus one in one million.
… then I don’t think I take the billion dollars. Honestly the only really interesting thing I can think of to do with that kind of money would be to play around with the future of civilization anyway.
I think that’s probably the question you meant to ask.
However, that’s a very, very specific question, and there are lots other hypotheticals you could come up with.
The “civilization whose main story is still being meaningfully controlled by humans etc.” thing bothers me. If a utopian godlike friendly AI were somehow on offer, I would actively pay money to take control away from humans and hand it to that AI… especially if I or people I personally care about had to live in that world. And there could also be valuable modes of human life other than civilization. Or even nonhuman things that might be more valuable. If those were my alternatives, and I knew that to be the case, then my answer might change.
For that matter, even if everybody were somehow going to die, my answer could depend on how civilization was going to end and what it was going to do before ending. A jerkass genie Omega might be withholding information and offering me a bum deal.
Suppose I knew that civilization would end because everybody had agreed, for reasons I cannot at this time guess, that the project was in some sense finished, all the value extracted, so they would just stop reproducing and die out quietly… and, perhaps implausibly, that conclusion wasn’t the result of some kind of fucked up mind control. I wouldn’t want to second-guess the future on that.
Similarly, what if I knew civilization would end because the alternative was some also as yet unforeseen fate worse than death? I wouldn’t want to avoid x-risk by converting it into s-risk.
In reality, of course, nobody’s offering me clearcut choices at all. I kind of bumble along, and thereby I (and of course others) sculpt my future light cone into some kind of “work of art” in some largely unpredictable way.
Basically what I’m saying is that, insofar as I consciously control that work of art, pure size isn’t the aesthetic I’m looking for. Beyond a certain point, size might be a negative. 1000 years is one thing, but vast numbers of humans overrunning galaxy after galaxy over billions of years, while basically doing the same old stuff, seems pointless to me.
Thanks for all the detail, and for looking past my clumsy questions!
It sounds like one disagreement you’re pointing at is about the shape of possible futures. You value “humanity colonizes the universe” far less than some other people do. (maybe rob in particular?) That seems sane to me.
The near-term decision questions that brought us here were about how hard to fight to “solve the alignment problem,” whatever that means. For that, the real question is about the difference in total value of the future conditioned on “solving” it and conditioned on “not solving” it. You think there are plausible distributions on future outcomes so that 1 one-millionth of the expected value of those futures is worth more to you than personally receiving 1 billion dollars.
Putting these bits together, I would guess the amount of value at stake is not really the thing driving disagreement here, but about the level of futility? Say you think humanity overall has about a 1% chance of succeeding with a current team of 1000 full-time-equivalents working on the problem. Do you want to join the team in that case? What if we have a one-in-one-thousand chance and a current team of 1 million? Do these seem like the right units to talk about the disagreement in?
(Another place that I thought there might be a disagreement: do you think solving the alignment problem increases or decreases s-risk? Here “solving the alignment problem” is the thing that we’re discussing giving up on because it’s too futile.)
In some philosophical sense, you have to multiply the expected value by the estimated chance of success. They both count. But I’m not sitting there actually doing multiplication, because I don’t think you can put good enough estimates on either one to make the result meaningful.
In fact, I guess that there’s a better than 1 percent chance of avoiding AI catastrophe in real life, although I’m not sure I’d want to (a) put a number on it, (b) guess how much of the hope is in “solving alignment” versus the problem just not being what people think it will be, (c) guess how much influence my or anybody else’s actions would have on moving the probability[edited from “property”...], or even (d) necessarily commit to very many guesses about which actions would move the probability in which directions. I’m just generally not convinced that the whole thing is predictable down to 1 percent at all.
In any case, I am not in fact working on it.
I don’t actually know what values I would put on a lot of futures, even the 1000 year one. Don’t get hung up on the billion dollars, because I also wouldn’t take a billion dollars to singlemindedly dedicate the remainder of my life , or even my “working time”, to anything in particular unless I enjoyed it. Enjoying life is something you can do with relative certainty, and it can be enough even if you then die. That can be a big enough “work of art”. Everybody up to this point has in fact died, and they did OK.
For that matter, I’m about 60 years old, so I’m personally likely to die before any of this stuff happens… although I do have a child and would very much prefer she didn’t have to deal with anything too awful.
I guess I’d probably work on it if I thought I had a large, clear contribution to make to it, but in fact I have absolutely no idea at all how to do it, and no reason to expect I’m unusually talented at anything that would actually advance it.
do you think solving the alignment problem increases or decreases s-risk
If you ended up enacting a serious s-risk, I don’t understand how you could say you’d solved the alignment problem. At least not unless the values you were aligning with were pretty ugly ones.
I will admit that sometimes I think other people’s ideas of good outcomes sound closer to s-risks than I would like, though. If you solved the problem of aligning with those people, I might see it as an increase.
Have you considered local movement building? Perhaps, something simple like organising dinners or a reading group to discuss these issues? Maybe no-one would come, but it’s hard to say unless you give it a go and, in any case, a small group of two or three thoughtful people is more valuable than a much larger group of people who are just there to pontificate without really thinking through anything deeply.
The amount of EV at stake in my (and others’) experiences over the next few years/decades is just too small compared to the EV at stake in the long-term future.
AI alignment isn’t the only option to improve the EV of the long-term future, though.
Previously comparisons between the case for AI xrisk mitigation and Pascal’s Mugging were rightly dismissed on the grounds that the probability of AI xrisk is not actually that small at all. But if the probability of averting the xrisk is as small as discussed here then the comparison with Pascal’s Mugging is entirely appropriate.
[...] I had originally intended the scenario of Pascal’s Mugging to point up what seemed like a basic problem with combining conventional epistemology with conventional decision theory: Conventional epistemology says to penalize hypotheses by an exponential factor of computational complexity. This seems pretty strict in everyday life: “What? for a mere 20 bits I am to be called a million times less probable?” But for stranger hypotheses about things like Matrix Lords, the size of the hypothetical universe can blow up enormously faster than the exponential of its complexity. This would mean that all our decisions were dominated by tiny-seeming probabilities (on the order of 2-100 and less) of scenarios where our lightest action affected 3↑↑4 people… which would in turn be dominated by even more remote probabilities of affecting 3↑↑5 people...
[...] Unfortunately I failed to make it clear in my original writeup that this was where the problem came from, and that it was general to situations beyond the Mugger. Nick Bostrom’s writeup of Pascal’s Mugging for a philosophy journal used a Mugger offering a quintillion days of happiness, where a quintillion is merely 1,000,000,000,000,000,000 = 1018. It takes at least two exponentiations to outrun a singly-exponential complexity penalty. I would be willing to assign a probability of less than 1 in 1018 to a random person being a Matrix Lord. You may not have to invoke 3↑↑↑3 to cause problems, but you’ve got to use something like 1010100 - double exponentiation or better. Manipulating ordinary hypotheses about the ordinary physical universe taken at face value, which just contains 1080 atoms within range of our telescopes, should not lead us into such difficulties.
(And then the phrase “Pascal’s Mugging” got completely bastardized to refer to an emotional feeling of being mugged that some people apparently get when a high-stakes charitable proposition is presented to them, regardless of whether it’s supposed to have a low probability. This is enough to make me regret having ever invented the term “Pascal’s Mugging” in the first place; and for further thoughts on this see The Pascal’s Wager Fallacy Fallacy (just because the stakes are high does not mean the probabilities are low, and Pascal’s Wager is fallacious because of the low probability, not the high stakes!) and Being Half-Rational About Pascal’s Wager Is Even Worse. Again, when dealing with issues the mere size of the apparent universe, on the order of 1080 - for small large numbers—we do not run into the sort of decision-theoretic problems I originally meant to single out by the concept of “Pascal’s Mugging”. My rough intuitive stance on x-risk charity is that if you are one of the tiny fraction of all sentient beings who happened to be born here on Earth before the intelligence explosion, when the existence of the whole vast intergalactic future depends on what we do now, you should expect to find yourself surrounded by a smorgasbord of opportunities to affect small large numbers of sentient beings. There is then no reason to worry about tiny probabilities of having a large impact when we can expect to find medium-sized opportunities of having a large impact, so long as we restrict ourselves to impacts no larger than the size of the known universe.)
[...] And finally, I once again state that I abjure, refute, and disclaim all forms of Pascalian reasoning and multiplying tiny probabilities by large impacts when it comes to existential risk. We live on a planet with upcoming prospects of, among other things, human intelligence enhancement, molecular nanotechnology, sufficiently advanced biotechnology, brain-computer interfaces, and of course Artificial Intelligence in several guises. If something has only a tiny chance of impacting the fate of the world, there should be something with a larger probability of an equally huge impact to worry about instead. You cannot justifiably trade off tiny probabilities of x-risk improvement against efforts that do not effectuate a happy intergalactic civilization, but there is nonetheless no need to go on tracking tiny probabilities when you’d expect there to be medium-sized probabilities of x-risk reduction.
[...] EDIT: To clarify, “Don’t multiply tiny probabilities by large impacts” is something that I apply to large-scale projects and lines of historical probability. On a very large scale, if you think FAI stands a serious chance of saving the world, then humanity should dump a bunch of effort into it, and if nobody’s dumping effort into it then you should dump more effort than currently into it. On a smaller scale, to compare two x-risk mitigation projects in demand of money, you need to estimate something about marginal impacts of the next added effort (where the common currency of utilons should probably not be lives saved, but “probability of an ok outcome”, i.e., the probability of ending up with a happy intergalactic civilization). In this case the average marginal added dollar can only account for a very tiny slice of probability, but this is not Pascal’s Wager. Large efforts with a success-or-failure criterion are rightly, justly, and unavoidably going to end up with small marginally increased probabilities of success per added small unit of effort. It would only be Pascal’s Wager if the whole route-to-an-OK-outcome were assigned a tiny probability, and then a large payoff used to shut down further discussion of whether the next unit of effort should go there or to a different x-risk.
If I understand your argument, it’s something like “when the probability of the world being saved is below n%, humanity should stop putting any effort into saving the world”. Could you clarify what value of n (roughly) you think justifies “let’s give up”?
(If we just speak in qualitative terms, we’re more likely to just talk past each other. E.g., making up numbers: maybe you’ll say ‘we should give up if the world is only one-in-a-million likely to survive’, and Eliezer will reply ‘oh, of course, but our survival odds are way higher than that’. Or maybe you’ll say ‘we should give up if the world is only one-in-fifty likely to survive’, and Eliezer will say ‘that sounds like the right ballpark for how dire our situation is, but that seems way too early to simply give up’.)
I think - Humans are bad at informal reasoning about small probabilities since they don’t have much experience to calibrate on, and will tend to overestimate the ones brought to their attention, so informal estimates of the probability very unlikely events should usually be adjusted even lower. - Humans are bad at reasoning about large utilities, due to lack of experience as well as issues with population ethics and the mathematical issues with unbounded utility, so estimates of large utilities of outcomes should usually be adjusted lower. - Throwing away most of the value in the typical case for the sake of an unlikely case seems like a dubious idea to me even if your probabilities and utility estimates are entirely correct; the lifespan dilemma and similar results are potential intuition pumps about the issues with this, and go through even with only single-exponential utilities at each stage. Accordingly I lean towards overweighting the typical range of outcomes in my decision theory relative to extreme outcomes, though there are certainly issues with this approach as well.
As far as where the penalty starts kicking in quantitatively, for personal decisionmaking I’d say somewhere around “unlikely enough that you expect to see events at least this extreme less than once per lifetime”, and for altruistic decisionmaking “unlikely enough that you expect to see events at least this extreme less than once in the history of humanity”. For something on the scale of AI alignment I think that’s around 1/1000? If you think the chances of success are still over 1% then I withdraw my objection.
The Pascalian concern aside I note that the probability of AI alignment succeeding doesn’t have to be *that* low before its worthwhileness becomes sensitive to controversial population ethics questions. If you don’t consider lives averted to be a harm then spending $10B to decrease the chance of 10 billion deaths by 1/10000 is worse value than AMF. If you’re optimizing for the average utility of all lives eventually lived then increasing the chance of a flourishing future civilization to pull up the average is likely worth more but plausibly only ~100x more (how many people would accept a 1% chance of postsingularity life for a 99% chance of immediate death?) so it’d still be a bad bet below 1/1000000. (Also if decreasing xrisk increases srisk, or if the future ends up run by total utilitarians, it might actually pull the average down.)
If I’m understanding the original question correctly (and if not, well, I’m asking it myself), the issue is that as you just pointed out, there are plenty of non-AI-related massive threats to humanity that we may be able to avert with far higher likelihood, (assuming we survive long enough to be able to do so). If the probability of changing the AGI end-of-the-world situation is extremely low, and if that was the only potential danger to humanity, then of course we should still focus on it. However, we also face many other risks we actually stand a chance of fighting, and according to Yudkowsky’s line of thinking, we should act for the counterfactual world in which we somehow solve the alignment problem. Therefore, shouldn’t we be focusing more on other issues, if the probabilities are really that bad?
There’s some case for it but I’d generally say no. Usually when voting you are coordinating with a group of people with similar decision algorithms who you have some ability to communicate with, and the chance of your whole coordinated group changing the outcome is fairly large, and your own contribution to it pretty legible. This is perhaps analogous to being one of many people working on AI safety if you believe that the chance that some organization solves AI safety is fairly high (it’s unlikely that your own contributions will make the difference but you’re part of a coordinated effort that likely will). But if you believe is extremely unlikely that anybody will solve AI safety then the whole coordinated effort is being Pascal-Mugged.
If the probability of success goes small enough, then yes, you give up on the long-term future. But I don’t think we can realistically reach that point, even if the situation gets a lot more dire than it looks today. It matters what the actual probabilities look like (e.g., 1 in 20 and 1 in 1000 are both very different from 1 in googolplex).
None of this implies that you should disregard your own happiness, stop caring about your friends, etc. Even if human brains were strictly utilitarian (which they’re not), disregarding your happiness, being a jerk, etc. are obviously terrible strategies for optimizing the long-term future. Genuinely taking the enormous stakes into account in your decision-making doesn’t require that you adopt dumb or self-destructive policies that inevitably cause burn-out, poor coordination, etc.
Sure, but it’s dignity in the specific realm of “facing unaligned AGI knowing we did everything we could”, not dignity in general.
… but it discards all concerns outside of that. “If I regret my planet’s death then I regret it, and it’s beneath my dignity to pretend otherwise” does not imply that there might not be other values you could achieve during the time available.
Another way to put that, perhaps, is that “knowing we did everything we could” doesn’t seem particularly dignified. Not if you had no meaningful expectation it could work. Extracting whatever other, potentially completely unrelated, value you could from the remaining available time would seem a lot more dignified to me than continuing on something you truly think is futile.
The amount of EV at stake in my (and others’) experiences over the next few years/decades is just too small compared to the EV at stake in the long-term future. The “let’s just give up” intuition makes sense in a regime where we’re comparing stakes that differ by 10x, or 100x, or 1000x; I think its appeal in this case comes from the fact that it’s hard to emotionally appreciate how much larger the quantities we’re talking about are.
(But the stakes don’t change just because it’s subjectively harder to appreciate them; and there’s nothing dignified about giving up prematurely because of an arithmetic error.)
I think that utilitarianism and actual human values are in different galaxies (example of more realistic model). There’s no way I would sacrifice a truly big chunk of present value (e.g. submit myself to a year of torture) to increase the probability of a good future by something astronomically small. Given Yudkowsky’s apparent beliefs about success probabilities, I might have given up on alignment research altogether[1].
On that other hand, I don’t inside-view!think the success probability is quite that small, and also the reasoning error that leads to endorsing utilitarianism seems positively correlated with the reasoning error that leads to extremely low success probability. Because, if you endorse utilitarianism then it generates a lot of confusion about the theory of rational agents, which makes you think there are more unsolved questions than there really are[2]. In addition, value learning seems more hopeless than it actually is.
I have some reservations about posting this kind of comments, because they might be contributing to shattering the shared illusion of utilitarianism and its associated ethos, an ethos whose aesthetics I like and which seems to motivate people to do good things. (Not to mention, these comments might cause people to think less of me.) On the other hand, the OP says we need to live inside reality and be honest with ourselves and with each other, and I agree with that.
But maybe not, because it’s also rewarding in other ways.
Ofc there are still many unsolved questions.
I think that I’d easily accept a year of torture in order to produce ten planets worth of thriving civilizations. (Or, if I lack the resolve to follow through on a sacrifice like that, I still think I’d have the resolve to take a pill that causes me to have this resolve.)
‘Produce ten planets worth of thriving civilizations, with certainty’ feels a lot more tempting to me than ‘produce an entire universe of thriving civilizations, with tiny probability’, but I think that’s because I have a hard time imagining large quantities and because of irrational, non-endorsed attachment to certainties, not because of a deep divergence between my values and utilitarianism.
I do think my values are very non-utilitarian in tons of ways. A utilitarian would have zero preference for their own well-being over anyone else’s, would care just as much for strangers as for friends and partners, etc. This obviously doesn’t describe my values.
But the cases where this reflectively endorsed non-utilitarianism shows up, are ones where I’m comparing, e.g., one family member’s happiness versus a stranger’s happiness, or the happiness of a few strangers. I don’t similarly feel that a family member of mine ought to matter more than an intergalactic network of civilizations. (And to the extent I do feel that way, I certainly don’t endorse it!)
Yes, if utilitarianism is wrong in the particular ways you think it is (which I gather is a strict superset of the ways I think it is?), then I want to believe as much. I very much endorse you sharing arguments to that effect! :)
[edit: looks like Rob posted elsethread a comment addressing my question here]
I’m a bit confused by this argument, because I thought MIRI-folk had been arguing against this specific type of logic. (I might be conflating a few different types of arguments, or might be conflating ‘well, Eliezer said this, so Rob automatically endorses it’, or some such).
But, I recall recommendations to generally not try to get your expected value from multiplying tiny probabilities against big values, because a) in practice that tends to lead to cognitive errors, b) in many cases people were saying things like “x-risk is a small probability of a Very Bad Outcome”, when the actual argument was “x-risk is a big probability of a Very Bad Outcome.”
(Right now maybe you’re maybe making a different argument, not about what humans should do, but about some underlying principles that would be true if we were better at thinking about things?)
Quoting the excerpt from Being Half-Rational About Pascal’s Wager is Even Worse that I quoted in the other comment:
From my perspective, the name of the game is ‘make the universe as a whole awesome’. Within that game, it would be silly to focus on unlikely fringe x-risks when there are high-probability x-risks to worry about; and it would be silly to focus on intervention ideas that have a one-in-a-million chance of vastly improving the future, when there are other interventions that have a one-in-a-thousand chance of vastly improving the future, for example.
That’s all in the context of debates between longtermist strategies and candidate megaprojects, which is what I usually assume is the discussion context. You could have a separate question that’s like ‘maybe I should give up on ~all the value in the universe and have a few years of fun playing sudoku and watching Netflix shows before AI kills me’.
In that context, the basic logic of anti-Pascalian reasoning still applies (easy existence proof: if working hard on x-risk raised humanity’s odds of survival from 1/1010100 to 5/1010100, it would obviously not be worth working hard on x-risk), but I don’t think we’re anywhere near the levels of P(doom) that would warrant giving up on the future.
‘There’s no need to work on supervolcano-destroying-humanity risk when there are much more plausible risks like bioterrorism-destroying-humanity to worry about’ is a very different sort of mental move than ‘welp, humanity’s odds of surviving are merely 1-in-100, I guess the reasonable utility-maximizing thing to do now is to play sudoku and binge Netflix for a few years and then die’. 1-in-100 is a fake number I pulled out of a hat, but it’s an example of a very dire number that’s obviously way too high to justify humanity giving up on its future.
(This is all orthogonal to questions of motivation. Maybe, in order to avoid burning out, you need to take more vacation days while working on a dire-looking project, compared to the number of vacation days you’d need while working on an optimistic-looking project. That’s all still within the framework of ‘trying to do longtermist stuff’, while working with a human brain.)
One additional thing adding confusion is Nate Soares’ latest threads on wallowing* which… I think are probably compatible with all this, but I couldn’t pass the ITT on.
*I think his use of ‘wallowing’ is fairly nonstandard, you shouldn’t read into it until you’ve talked to him about it for at least an hour.
Where do I find these threads?
Ah, this was in-person. (“Threads” was more/differently metaphorical than usual)
I think that “resolve” is often a lie we tell ourselves to explain the discrepancies between stated and revealed preferences. I concede that if you took that pill, it would be evidence against my position (but, I believe you probably would not).
A nuance to keep in mind is that reciprocity can be a rational motivation to behave more altruistically that you otherwise would. This can come about from tit-for-tat / reputation systems, or even from some kind of acausal cooperation. Reciprocity is effectively moving us closer to utilitarianism, but certainly not all the way there.
So, if I’m weighing the life of my son or daughter against an intergalatic network of civilizations, which I never heard of before and never going to hear about after, and which wouldn’t even reciprocate in a symmetric scenario, I’m choosing my child for sure.
If I knew as a certainty that I cannot do nearly as much good some other way, and I was certain that taking the pill causes that much good, I’d take the pill, even if I die after the torture and no one will know I sacrificed myself for others.
I admit those are quite unusual values for a human, and I’m not arguing about that it would be rational because of utilitarianism or so, just that I would do it. (Possible that I’m wrong, but I think very likely I’m not.) Also, I see that the way my brain is wired outer optimization pushes against that policy, and I think I probably wouldn’t be able to take the pill a second time under the same conditions (given that I don’t die after torture), or at least not often.
I don’t think those are unusual values for a human. Many humans have sacrificed their lives (and endured great hardship and pain, etc.) to help others. And many more would take a pill to gain that quality, seeing it as a more courageous and high-integrity expression of their values.
I’d do this to save ten planets of worth of thriving civilizations, but doing it to produce ten planets worth of thriving civilizations seems unreasonable to me. Nobody is harmed by preventing their birth, and I have very little confidence either way as to whether their existence will wind up increasing the average utility of all lives ever eventually lived.
I used to favour average utilitarianism too, until I learned about the sadistic conclusion. That was sufficient to overcome any aversion I had to the repugnant conclusion.
I’m happy to accept the sadistic conclusion as normally stated, and in general I find “what would I prefer if I were behind the Rawlsian Veil and going to be assigned at random to one of the lives ever actually lived” an extremely compelling intuition pump. (Though there are other edge cases that I feel weirder about, e.g. is a universe where everyone has very negative utility really improved by adding lots of new people of only somewhat negative utility?)
As a practical matter though I’m most concerned that total utilitarianism could (not just theoretically but actually, with decisions that might be locked-in in our lifetimes) turn a “good” post-singularity future into Malthusian near-hell where everyone is significantly worse off than I am now, whereas the sadistic conclusion and other contrived counterintuitive edge cases are unlikely to resemble decisions humanity or an AGI we create will actually face. Preventing the lock-in of total utilitarian values therefore seems only a little less important to me than preventing extinction.
Another question. Imagine a universe with either only 5 or 10 people. If they’re all being tortured equally badly at a level of −100 utility, are you sure you’re indifferent as to the number of people existing? Isn’t less better here?
Yeah that’s essentially the example I mentioned that seems weirder to me, but I’m not sure, and at any rate it seems much further from the sorts of decisions I actually expect humanity to have to make than the need to avoid Malthusian futures.
Well, that’s it, your access to the medicine cabinet is revoked. :-)
You can say that it’s wrong to think you can actually measure and usefully aggregate utility functions. That’s truly a matter of fact.
… but to be able to say that utilitarianism in all its forms was “wrong” would require an external standard. Ethical realism really is wrong.
Utiltiarianism as a meta ethical theory can be wrong without being ethically wrong. Meta ethical theories can be criticised for being contradictory, unworkable, irrelevant, etc.
Utilitarianism can be wrong as a description of actual human values, or of ‘the values humans would self-modify to if they fully understood the consequences of various self-modification paths’.
OK, but that’s an is-ought issue. I didn’t perceive the question as being about factual human values, but about what people should do. It’s an ethical system, after all, not a scientific system.
Your definition is wrong; I think that way of defining ‘utilitarianism’ is purely an invention of a few LWers who didn’t understand what the term meant and got it mixed up with ‘utility function’. AFAIK, there’s no field where ‘utilitarianism’ has ever been used to mean ‘having a utility function’.
I had this confusion for a few years. It personally made me dislike the term utilitarian, because it really mismatched my internal ontology.
Hm, I worry I might be a confused LWer. I definitely agree that “having a utility function” and “being a utilitarian” are not identical concepts, but they’re highly related, no? Would you agree that, to a first-approximation, being a utilitarian means having a utility function with the evolutionary godshatter as terminal values? Even this is not identical to the original philosophical meaning I suppose, but it seems highly similar, and it is what I thought people around here meant.
This is not even close to correct, I’m afraid. In fact being a utilitarian has nothing whatever to do with the concept of a utility function. (Nor—separately—does it have much to do with “evolutionary godshatter” as values; I am not sure where you got this idea!)
Please read this page for some more info presented in a systematic way.
I meant to convey a utility function with certain human values as terminal values, such as pleasure, freedom, beauty, etc.; godshatter was a stand-in.
If the idea of a utility function has literally nothing to do with moral utilitarianism, even around here, I would question why in the above when Eliezer is discussing moral questions he references expected utility calculations? I would also point to “intuitions behind utilitarianism“ as pointing at connections between the two? Or “shut up and multiply”? Need I go on?
I know classical utilitarianism is not exactly the same, but even in what you linked, it talks about maximizing the total sum of human happiness and sacrificing some goods for others, measured under a single metric “utility”. That sounds an awful lot like a utility function trading off human terminal values? I don’t see how what I’m pointing at isn’t just a straightforward idealization of classical utilitarianism.
Yes, I understood your meaning. My response stands.
What is the connection? Expected utility calculations can be, and are, relevant to all sorts of things, without being identical to, or similar to, or inherently connected with, etc., utilitarianism.
The linked post makes some subtle points, as well as some subtle mistakes (or, perhaps, instances of unclear writing on Eliezer’s part; it’s hard to tell).
The “utility” of utilitarianism and the “utility” of expected utility theory are two very different concepts that, quite unfortunately and confusingly, share a term. This is a terminological conflation, in other words.
Here is a long explanation of the difference.
None of what you have linked so far has particularly conveyed any new information to me, so I think I just flatly disagree with you. As that link says, the “utility” in utilitarianism just means some metric or metrics of “good”. People disagree about what exactly should go into “good” here, but godshatter refers to all the terminal values humans have, so that seems like a perfectly fine candidate for what the “utility” in utilitarianism ought to be. The classic “higher pleasures” in utilitarianism lends credence toward this fitting into the classical framework; it is not a new idea that utilitarianism can include multiple terminal values with relative weighting.
Under utilitarianism, we are then supposed to maximize this utility. That is, maximize the satisfaction of the various terminal goals we are taking as good, aggregated into a single metric. And separately, there happens to be this elegant idea called “utility theory”, which tells us that if you have various preferences you are trying to maximize, there is a uniquely rational way to do that, which involves giving them relative weights and aggregating into a single metric… You seriously think there’s no connection here? I honestly thought all this was obvious.
In that last link, they say “Now, it is sometimes claimed that one may use decision-theoretic utility as one possible implementation of the utilitarian’s ‘utility’” then go on to say why this is wrong, but I don’t find it to be a knockdown argument; that is basically what I believe and I think I stand by it. Like, if you plug “aggregate human well-being along all relevant dimensions” into the utility of utility theory, I don’t see how you don’t get exactly utilitarianism out of that, or at least one version of it?
EDIT: Please also see in the above post under “You should never try to reason using expected utilities again. It is an art not meant for you. Stick to intuitive feelings henceforth.” It seems to me that Eliezer goes on to consistently use the “expected utilities” of utility theory to be synonymous to the “utilities” of utilitarianism and the “consequences” of consequentialism. Do you agree that he’s doing this? If so, I assume you think he’s wrong for doing it? Eliezer tends to call himself a utilitarian. Do you agree that he is one, or is he something else? What would you call “using expected utility theory to make moral decisions, taking the terminal value to be human well-being”?
You don’t get utilitarianism out of it because, as explained at the link, VNM utility is incomparable between agents (and therefore cannot be aggregated across agents). There are no versions of utilitarianism that can be constructed out of decision-theoretic utility. This is an inseparable part of the VNM formalism.
That having been said, even if it were possible to use VNM utility as the “utility” of utilitarianism (again, it is definitely not!), that still wouldn’t make them the same theory, or necessarily connected, or conceptually identical, or conceptually related, etc. Decision-theoretic expected utility theory isn’t a moral theory at all.
Really, this is all explained in the linked post…
Re: the “EDIT:” part:
No, I do not agree that he’s doing this.
Yes, he’s a utilitarian. (“Torture vs. Dust Specks” is a paradigmatic utilitarian argument.)
I would call that “being confused”.
How to (coherently, accurately, etc.) map “human well-being” (whatever that is) to any usable scalar (not vector!) “utility” which you can then maximize the expectation of, is probably the biggest challenge and obstacle to any attempt at formulating a moral theory around the intuition you describe. (“Utilitarianism using VNM utility” is a classic failed and provably unworkable attempt at doing this.)
If you don’t have any way of doing this, you don’t have a moral theory—you have nothing.
If he has a proof that utilitarianism, as usually defined the highly altruistic ethical theory, is equivalent to maximization of an arbitrary UF , given some considerations about coherence, then he has something extraordinary that should be widely I own.
Or he is using “utilitarianism” in a weird way. ..or he is not and he is just confused.
I said nothing about an arbitrary utility function (nor proof for that matter). I was saying that applying utility theory to a specific set of terminal values seems to basically get you an idealized version of utilitarianism, which is what I thought the standard moral theory was around here.
If you know the utility function that is objectively correct, then you have the correct metaethics, and VnM style utility maximisation only tells you how implement it efficiently.
The first thing is “utilitarianism is true”, the second thing is “rationality is useful”.
But that goes back to the issue everyone criticises: EY recommends an object level decision...prefer torture to dust specks… unconditionally without knowing the reader’s UF.
If he had succeeded in arguing, or even tried to tried to argue that there is one true objective UF, then he would be in a position to hand out unconditional advice.
Or if he could show that preferring torture to dust specks was rational given an arbitrary UF, then he could also hand out unconditional advice (in the sense that the conditioning on an subjective UF doesn’t make a difference,). But he doesn’t do that, because if someone has a UF that places negative infinity utility on torture, that’s not up for grabs… their personal UF is what it is .
Are you alluding to agents with VNM utility functions here?
I’m endorsing VNM. The confusion I’m talking about is things like, agents with non-finite or unbounded utility functions, agents with discontinuous utility functions, the paradoxes of population ethics et cetera.
I assign much lower value than a lot of people here to some vast expansionist future… and I suspect that even if I’m in the minority, I’m not the only one.
It’s not an arithmetic error.
Can you be more explicit about the arithmetic? Would increasing the probability of civilization existing 1000 years from now from 10^{-7} to 10^{-6} be worth more or less to you than receiving a billion dollars right now?
Do I get any information about what kind of civilization I’m getting, and/or about what it would be doing during the 1000 years or after the 1000 years?
On edit: Removed the “by how much” because I figured out how to read the notation that gave the answer.
I guess by “civilization” I meant “civilization whose main story is still being meaningfully controlled by humans who are individually similar to modern humans”. Other than that, I just mean your current expectations about what that civilization is like, conditioned on it existing.
(It seems like you could be disagreeing with “a lot of people here” about what those futures look like or how valuable they are or both—I’d be happy to get clarification on either front.)
Hmm. I should have asked what the alternative to civilization was going to be.
Nailing it down to a very specific question, suppose my alternatives are...
I get a billion dollars. My life goes on as normal otherwise. Civilization does whatever it’s going to do; I’m not told what. Omega tells me that everybody will suddenly drop dead at some time within 1000 years, for reasons I don’t get to know, with probability one minus one in ten million.
… versus...
I do not get a billion dollars. My life goes on as normal otherwise. Civilization does whatever it’s going to do; I’m not told what. Omega tells me that everybody will suddenly drops dead at some time within 1000 years, for reasons I don’t get to know, with probability one minus one in one million.
… then I don’t think I take the billion dollars. Honestly the only really interesting thing I can think of to do with that kind of money would be to play around with the future of civilization anyway.
I think that’s probably the question you meant to ask.
However, that’s a very, very specific question, and there are lots other hypotheticals you could come up with.
The “civilization whose main story is still being meaningfully controlled by humans etc.” thing bothers me. If a utopian godlike friendly AI were somehow on offer, I would actively pay money to take control away from humans and hand it to that AI… especially if I or people I personally care about had to live in that world. And there could also be valuable modes of human life other than civilization. Or even nonhuman things that might be more valuable. If those were my alternatives, and I knew that to be the case, then my answer might change.
For that matter, even if everybody were somehow going to die, my answer could depend on how civilization was going to end and what it was going to do before ending. A jerkass genie Omega might be withholding information and offering me a bum deal.
Suppose I knew that civilization would end because everybody had agreed, for reasons I cannot at this time guess, that the project was in some sense finished, all the value extracted, so they would just stop reproducing and die out quietly… and, perhaps implausibly, that conclusion wasn’t the result of some kind of fucked up mind control. I wouldn’t want to second-guess the future on that.
Similarly, what if I knew civilization would end because the alternative was some also as yet unforeseen fate worse than death? I wouldn’t want to avoid x-risk by converting it into s-risk.
In reality, of course, nobody’s offering me clearcut choices at all. I kind of bumble along, and thereby I (and of course others) sculpt my future light cone into some kind of “work of art” in some largely unpredictable way.
Basically what I’m saying is that, insofar as I consciously control that work of art, pure size isn’t the aesthetic I’m looking for. Beyond a certain point, size might be a negative. 1000 years is one thing, but vast numbers of humans overrunning galaxy after galaxy over billions of years, while basically doing the same old stuff, seems pointless to me.
Thanks for all the detail, and for looking past my clumsy questions!
It sounds like one disagreement you’re pointing at is about the shape of possible futures. You value “humanity colonizes the universe” far less than some other people do. (maybe rob in particular?) That seems sane to me.
The near-term decision questions that brought us here were about how hard to fight to “solve the alignment problem,” whatever that means. For that, the real question is about the difference in total value of the future conditioned on “solving” it and conditioned on “not solving” it.
You think there are plausible distributions on future outcomes so that 1 one-millionth of the expected value of those futures is worth more to you than personally receiving 1 billion dollars.
Putting these bits together, I would guess the amount of value at stake is not really the thing driving disagreement here, but about the level of futility? Say you think humanity overall has about a 1% chance of succeeding with a current team of 1000 full-time-equivalents working on the problem. Do you want to join the team in that case? What if we have a one-in-one-thousand chance and a current team of 1 million? Do these seem like the right units to talk about the disagreement in?
(Another place that I thought there might be a disagreement: do you think solving the alignment problem increases or decreases s-risk? Here “solving the alignment problem” is the thing that we’re discussing giving up on because it’s too futile.)
In some philosophical sense, you have to multiply the expected value by the estimated chance of success. They both count. But I’m not sitting there actually doing multiplication, because I don’t think you can put good enough estimates on either one to make the result meaningful.
In fact, I guess that there’s a better than 1 percent chance of avoiding AI catastrophe in real life, although I’m not sure I’d want to (a) put a number on it, (b) guess how much of the hope is in “solving alignment” versus the problem just not being what people think it will be, (c) guess how much influence my or anybody else’s actions would have on moving the probability[edited from “property”...], or even (d) necessarily commit to very many guesses about which actions would move the probability in which directions. I’m just generally not convinced that the whole thing is predictable down to 1 percent at all.
In any case, I am not in fact working on it.
I don’t actually know what values I would put on a lot of futures, even the 1000 year one. Don’t get hung up on the billion dollars, because I also wouldn’t take a billion dollars to singlemindedly dedicate the remainder of my life , or even my “working time”, to anything in particular unless I enjoyed it. Enjoying life is something you can do with relative certainty, and it can be enough even if you then die. That can be a big enough “work of art”. Everybody up to this point has in fact died, and they did OK.
For that matter, I’m about 60 years old, so I’m personally likely to die before any of this stuff happens… although I do have a child and would very much prefer she didn’t have to deal with anything too awful.
I guess I’d probably work on it if I thought I had a large, clear contribution to make to it, but in fact I have absolutely no idea at all how to do it, and no reason to expect I’m unusually talented at anything that would actually advance it.
If you ended up enacting a serious s-risk, I don’t understand how you could say you’d solved the alignment problem. At least not unless the values you were aligning with were pretty ugly ones.
I will admit that sometimes I think other people’s ideas of good outcomes sound closer to s-risks than I would like, though. If you solved the problem of aligning with those people, I might see it as an increase.
Have you considered local movement building? Perhaps, something simple like organising dinners or a reading group to discuss these issues? Maybe no-one would come, but it’s hard to say unless you give it a go and, in any case, a small group of two or three thoughtful people is more valuable than a much larger group of people who are just there to pontificate without really thinking through anything deeply.
AI alignment isn’t the only option to improve the EV of the long-term future, though.
This is Pascal’s Mugging.
Previously comparisons between the case for AI xrisk mitigation and Pascal’s Mugging were rightly dismissed on the grounds that the probability of AI xrisk is not actually that small at all. But if the probability of averting the xrisk is as small as discussed here then the comparison with Pascal’s Mugging is entirely appropriate.
It’s not Pascal’s mugging in the senses described in the first posts about the problem:
Quoting from Being Half-Rational About Pascal’s Wager is Even Worse:
If I understand your argument, it’s something like “when the probability of the world being saved is below n%, humanity should stop putting any effort into saving the world”. Could you clarify what value of n (roughly) you think justifies “let’s give up”?
(If we just speak in qualitative terms, we’re more likely to just talk past each other. E.g., making up numbers: maybe you’ll say ‘we should give up if the world is only one-in-a-million likely to survive’, and Eliezer will reply ‘oh, of course, but our survival odds are way higher than that’. Or maybe you’ll say ‘we should give up if the world is only one-in-fifty likely to survive’, and Eliezer will say ‘that sounds like the right ballpark for how dire our situation is, but that seems way too early to simply give up’.)
I think
- Humans are bad at informal reasoning about small probabilities since they don’t have much experience to calibrate on, and will tend to overestimate the ones brought to their attention, so informal estimates of the probability very unlikely events should usually be adjusted even lower.
- Humans are bad at reasoning about large utilities, due to lack of experience as well as issues with population ethics and the mathematical issues with unbounded utility, so estimates of large utilities of outcomes should usually be adjusted lower.
- Throwing away most of the value in the typical case for the sake of an unlikely case seems like a dubious idea to me even if your probabilities and utility estimates are entirely correct; the lifespan dilemma and similar results are potential intuition pumps about the issues with this, and go through even with only single-exponential utilities at each stage. Accordingly I lean towards overweighting the typical range of outcomes in my decision theory relative to extreme outcomes, though there are certainly issues with this approach as well.
As far as where the penalty starts kicking in quantitatively, for personal decisionmaking I’d say somewhere around “unlikely enough that you expect to see events at least this extreme less than once per lifetime”, and for altruistic decisionmaking “unlikely enough that you expect to see events at least this extreme less than once in the history of humanity”. For something on the scale of AI alignment I think that’s around 1/1000? If you think the chances of success are still over 1% then I withdraw my objection.
The Pascalian concern aside I note that the probability of AI alignment succeeding doesn’t have to be *that* low before its worthwhileness becomes sensitive to controversial population ethics questions. If you don’t consider lives averted to be a harm then spending $10B to decrease the chance of 10 billion deaths by 1/10000 is worse value than AMF. If you’re optimizing for the average utility of all lives eventually lived then increasing the chance of a flourishing future civilization to pull up the average is likely worth more but plausibly only ~100x more (how many people would accept a 1% chance of postsingularity life for a 99% chance of immediate death?) so it’d still be a bad bet below 1/1000000. (Also if decreasing xrisk increases srisk, or if the future ends up run by total utilitarians, it might actually pull the average down.)
If I’m understanding the original question correctly (and if not, well, I’m asking it myself), the issue is that as you just pointed out, there are plenty of non-AI-related massive threats to humanity that we may be able to avert with far higher likelihood, (assuming we survive long enough to be able to do so). If the probability of changing the AGI end-of-the-world situation is extremely low, and if that was the only potential danger to humanity, then of course we should still focus on it. However, we also face many other risks we actually stand a chance of fighting, and according to Yudkowsky’s line of thinking, we should act for the counterfactual world in which we somehow solve the alignment problem. Therefore, shouldn’t we be focusing more on other issues, if the probabilities are really that bad?
Is voting a pascal’s mugging?
There’s some case for it but I’d generally say no. Usually when voting you are coordinating with a group of people with similar decision algorithms who you have some ability to communicate with, and the chance of your whole coordinated group changing the outcome is fairly large, and your own contribution to it pretty legible. This is perhaps analogous to being one of many people working on AI safety if you believe that the chance that some organization solves AI safety is fairly high (it’s unlikely that your own contributions will make the difference but you’re part of a coordinated effort that likely will). But if you believe is extremely unlikely that anybody will solve AI safety then the whole coordinated effort is being Pascal-Mugged.
One thing I like about the “dignity as log-odds” framework is that it implicitly centers coordination.
To be clear:
If the probability of success goes small enough, then yes, you give up on the long-term future. But I don’t think we can realistically reach that point, even if the situation gets a lot more dire than it looks today. It matters what the actual probabilities look like (e.g., 1 in 20 and 1 in 1000 are both very different from 1 in googolplex).
None of this implies that you should disregard your own happiness, stop caring about your friends, etc. Even if human brains were strictly utilitarian (which they’re not), disregarding your happiness, being a jerk, etc. are obviously terrible strategies for optimizing the long-term future. Genuinely taking the enormous stakes into account in your decision-making doesn’t require that you adopt dumb or self-destructive policies that inevitably cause burn-out, poor coordination, etc.