Before my rejection of faith, I was plagued by a feeling of impending doom.
I was a happy atheist until I learned about the Friendly AI problem and estimated the likely outcome. I am now plagued by a feeling of impending doom.
Before my rejection of faith, I was plagued by a feeling of impending doom.
I was a happy atheist until I learned about the Friendly AI problem and estimated the likely outcome. I am now plagued by a feeling of impending doom.
If everyone’s inferred utility goes from 0 to 1, and the real-life utility monster cares more than the other people about one thing, the inferred utility will say he cares less than other people about something else. Let him play that game until the something else happens, then he loses, and that’s a fine outcome.
That’s not the situation I’m describing; if 0 is “you and all your friends and relatives getting tortured to death” and 1 is “getting everything you want,” the utility monster is someone who puts “not getting one thing I want” at, say, .1 whereas normal people put it at .9999.
You have failed to disagree with me. My proposal exactly fits your alleged counterexample.
Suppose Alice is a utility monster where:
U(Alice, torture of everybody) = 0
U(Alice, everything) = 1
U(Alice, no cookie) = 0.1
U(Alice, Alice dies) = 0.05
And Bob is normal, except he doesn’t like Alice:
U(Bob, torture of everybody) = 0
U(Bob, everything) = 1
U(Bob, Alice lives, no cookie) = 0.8
U(Bob, Alice dies, no cookie) = 0.9
If the FAI has a cookie it can give to Bob or Alice, it will give it to Alice, since U(cookie to Bob) = U(Bob, everything) + U(Alice, everything but a cookie) = 1 + 0.1 = 1.1 < U(cookie to Alice) = U(Bob, everything but a cookie) + U(Alice, everything) = 0.8 + 1 = 1.8. Thus Alice gets her intended reward for being a utility monster.
However, if the are no cookies available and the FAI can kill Alice, it will do so for the benefit of Bob, since U(Bob, Alice lives, no cookie) + U(Alice, Alice lives, no cookie) = 0.8 + 0.1 = 0.9 < U(Bob, Alice dies, no cookie) + U(Alice, Alice dies) = 0.9 + 0.05 = 0.95. The basic problem is that since Alice had the cookie fixation, that ate up so much of her utility range that her desire to live in the absence of the cookie was outweighed by Bob finding her irritating.
Another problem with Alice’s utility is that it supports the FAI doing lotteries that Alice would apparently prefer but a normal person would not. For example, assuming the outcome for Bob does not change, the FAI should prefer 50% Alice dies + 50% Alice gets a cookie (adds to 0.525) over 100% Alice lives without a cookie (which is 0.1). This is a different issue from interpersonal utility comparison.
How do you add two utilities together?
They are numbers. Add them.
And if humans turn out to be adaption-executers, then utility is going to look really weird, because it’ll depend a lot on framing and behavior.
Yes. So far as I can tell, if the FAI is going to do what people want, it has to model people as though they want something, and that means ascribing utility functions to them. Better alternatives are welcome. Giving up because it’s a hard problem is not welcome.
If people dislike losses more than they like gains and status is zero-sum, does that mean the reasonable result of average utilitarianism when applied to status is that everyone must be exactly the same status?
No. If Alice has high status and Bob has low status, and the FAI takes action to lower Alice’s status and raise Bob’s, and people hate losing, then Alice’s utility decrease will exceed Bob’s utility increase, so the FAI will prefer to leave the status as it is. Similarly, the FAI isn’t going to want to increase Alice’s status at the expense of Bob. The FAI just won’t get involved in the status battles.
I have not found this conversation rewarding. Unless there’s an obvious improvement in the quality of your arguments, I’ll drop out.
Edit: Fixed the math on the FAI-kills-Alice scenario. Vaniver continued to change the topic with every turn, so I won’t be continuing the conversation.
There seems to be an assumption here that empathy leads to morality. Sometimes, at least, empathy leads to being jerked around by the stupid goals of others instead of pursuing your own stupid goals, and in this case it’s not all that likely to lead to something fitting any plausible definition of “moral behavior”. Chogyam Trungpa called this “idiot compassion”.
Thus it’s important to distinguish caring about humanity as a whole from caring about individual humans. I read some of the links in the OP and did not see this distinction mentioned.
I procrastinated when in academia, but did not feel particularly attracted to the job, so option 1 is not always true. Comparison with people not in academia makes it seem that option 3 is not true for me either.
More questions to perhaps add:
What is self-modification? (In particular, does having one AI build another bigger and more wonderful AI while leaving “itself” intact count as self-modification? The naive answer is “no”, but I gather the informed answer is “yes”, so you’ll want to clarify this before using the term.)
What is wrong with the simplest decision theory? (That is, enumerate the possible actions and pick the one for which the expected utility of the outcome is best. I’m not sure what the standard name for that is.) It’s important to answer this so at some point you state the problem that timeless decision theory etc. are meant to solve.
I gather one of the problems with the simplest decision theory is that it gives the AI an incentive to self-modify under certain circumstances, and there’s a perceived need for the AI to avoid routine self-modification. The FAQ question might be “How can we avoid giving the AI an incentive to self-modify?” and perhaps “What are the risks of allowing the AI to self-modify?”
What problem is solved by extrapolation? (This goes in the CEV section.)
What are the advantages and disadvantages of having a bounded utility function?
Can we just upload a moral person? (In the “Need for FAI” section. IMO the answer is a clear “no”.)
I suggest rephrasing “What powers might it have?” in 1.10 to “What could we reasonably expect it to be able to do?”. The common phrase “magical powers” gives the word “powers” undesired connotations in this context, makes us sound like loonies.
A common tactic in human interaction is to care about everything more than the other person does, and explode (or become depressed) when they don’t get their way. How should such real-life utility monsters be dealt with?
If everyone’s inferred utility goes from 0 to 1, and the real-life utility monster cares more than the other people about one thing, the inferred utility will say he cares less than other people about something else. Let him play that game until the something else happens, then he loses, and that’s a fine outcome.
I doubt it can measure utilities
I think it can, in principle, estimate utilities from behavior. See http://www.fungible.com/respect.
simple average utilitarianism is so wracked with problems I’m not even sure where to begin.
The problems I’m aware of have to do with creating new people. If you assume a fixed population and humans who have comparable utilities as described above, are there any problems left? Creating new people is a more interesting use case than status conflicts.
Why do you find status uninteresting?
As I said, because maximizing average utility seems to get a reasonable result in that case.
It’s understanding of you doesn’t have to be more rigorous than your understanding of you.
It does if I want it to give me results any better than I can provide for myself.
No. For example, if it develops some diet drug that lets you safely enjoy eating and still stay skinny and beautiful, that might be a better result than you could provide for yourself, and it doesn’t need any special understanding of you to make that happen. It just makes the drug, makes sure you know the consequences of taking it, and offers it to you. If you choose take it, that tells the AI more about your preferences, but there’s no profound understanding of psychology required.
I also provided the trivial example of internal conflicts- external conflicts are much more problematic.
Putting an inferior argument first is good if you want to try to get the last word, but it’s not a useful part of problem solving. You should try to find the clearest problem where solving that problem solves all the other ones.
How will a FAI deal with the status conflicts that develop?
If it can do a reasonable job of comparing utilities across people, then maximizing average utility seems to do the right thing here. Comparing utilities between arbitrary rational agents doesn’t work, but comparing utilities between humans seems to—there’s an approximate universal maximum (getting everything you want) and an approximate universal minimum (you and all your friends and relatives getting tortured to death). Status conflicts are not one of the interesting use cases. Do you have anything better?
In some sense, the problem of FAI is the problem of rigorously understanding humans, and evo psych suggests that will be a massively difficult problem.
I think that bar is unreasonably high. If you have conflict between enjoying eating a lot vs being skinny and beautiful, and the FAI helps you do one or the other, then you aren’t in a position to complain that it did the wrong thing. It’s understanding of you doesn’t have to be more rigorous than your understanding of you.
For example, maybe you could chill the body rapidly to organ-donation temperatures, garrote the neck,..
It’s worse than I said, by the way. If the patient is donating kidneys and is brain dead, the cryonics people want the suspension to happen as soon as possible to minimize further brain damage. The organ donation people want the organ donation to happen when the surgical team and recipient are ready, so there will be conflict over the schedule.
In any case, the fraction of organ donors is small, and the fraction of cryonics cases is much smaller, and the two groups do not have a history of working with each other. Thus even if the procedure is technically possible, I don’t know of an individual who would be interested in developing the hybrid procedure. There’s lots of other stuff that is more important to everyone involved.
I would think that knowing evo psych is enough to realize [having an FAI find out human preferences, and then do them] is a dodgy approach at best.
I don’t see the connection, but I do care about the issue. Can you attempt to state an argument for that?
Human preferences are an imperfect abstraction. People talk about them all the time and reason usefully about them, so either an AI could do the same, or you found a counterexample to the Church-Turing thesis. “Human preferences” is a useful concept no matter where those preferences come from, so evo psych doesn’t matter.
Similarly, my left hand is an imperfect abstraction. Blood flows in, blood flows out, flakes of skin fall off, it gets randomly contaminated from the environment, and the boundaries aren’t exactly defined, but nevertheless it generally does make sense to think in terms of my left hand.
If you’re going to argue that FAI defined in terms of inferring human preferences can’t work, I hope that isn’t also going to be an argument that an AI can’t possibly use the concept of my left hand, since the latter conclusion would be absurd.
The process of vitrifying the head makes the rest of the body unsuitable for organ donations. If the organs are extracted first, then the large resulting leaks in the circulatory system make perfusing the brain difficult. If the organs are extracted after the brain is properly perfused, they’ve been perfused too, and with the wrong substances for the purposes of organ donation.
If “humility” can be used to justify both activities and their opposites so easily, perhaps it’s a useless concept and should be tabooed.
PMing or emailing official SIAI people should get to link to safer avenues to discussing these kinds of basilisks.
Hmm, should I vote you up because what you’re saying is true, or should I vote you down because you are attracting attention to the parent post which harmful to think about?
If an idea is guessable, then it seems irrational to think it is harmful to communicate it to somebody, since they could have guessed it themselves. Given that this is a website about rationality, IMO we should be able to talk about the chain of reasoning that leads to the decision that this guessable idea is harmful to communicate, since there’s clearly a flaw in there somewhere.
Upvoted the parent because I think the harm here is imaginary. Absurdly large utilities do not describe non-absurdly-large brains, but they are not a surprising output from humans displaying fitness. (Hey, I know a large number! Look at me!)
These ideas have come up and were suppressed before, so this is not a specific criticism of the original post.
Make sure that each CSA above the lowest level actually has “could”, “should”, and “would” labels on the nodes in its problem space, and make sure that those labels, their values, and the problem space itself can be reduced to the managing of the CSAs on the level below.
That statement would be much more useful if you gave a specific example. I don’t see how labels on the nodes are supposed to influence the final result.
There’s a general principle here that I wish I could state well. It’s something like “general ideas are easy, specific workable proposals are hard, and you’re probably wasting people’s time if you’re only describing a solution to the easy parts of the problem”.
One cause of this is that anyone who can solve the hard part of the problem can probably already guess the easy part, so they don’t benefit much from you saying it. Another cause is that the solutions to the hard parts of the problem tend to have awkward aspects to them that are best dealt with by modifying the easy part, so a solution to just the easy part is sure to be unworkable in ways that can’t be seen if that’s all you have.
I have this issue with your original post, and most of the FAI work that’s out there.
Well, one story is that humans and brains are irrational, and then you don’t need a utility function or any other specific description of how it works. Just figure out what’s really there and model it.
The other story is that we’re hoping to make a Friendly AI that might make rational decisions to help people get what they want in some sense. The only way I can see to do that is to model people as though they actually want something, which seems to imply having a utility function that says what they want more and what they want less. Yes, it’s not true, people aren’t that rational, but if a FAI or anyone else is going to help you get what you want, it has to model you as wanting something (and as making mistakes when you don’t behave as though you want something).
So it comes down to this question: If I model you as using some parallel decision theory, and I want to help you get what you want, how do I extract “what you want” from the model without first somehow converting that model to one that has a utility function?
Okay, I watched End of Evangelion and a variety of the materials leading up to it. I want my time back. I don’t recommend it.
So many people might be willing to go be a health worker in a poor country where aid workers are commonly (1 in 10,000) raped or killed, even though they would not be willing to be certainly attacked in exchange for 10,000 times the benefits to others.
I agree with your main point, but the thought experiment seems to be based on the false assumption that the risk of being raped or murdered are smaller than 1 in 10K if you stay at home. Wikipedia guesstimates that 1 in 6 women in the US are on the receiving end of attempted rape at some point, so someone who goes to a place with a 1 in 10K chance of being raped or murdered has probably improved their personal safety. To make a better thought experiment, I suppose you have to talk about the marginal increase in rape or murder rate when working in the poor country when compared to staying home, and perhaps you should stick to murder since the rape rate is so high.
The story isn’t working for me. A boy or novice soldier, depending on how you define it, is inexplicably given the job of running a huge and difficult-to-use robot to fight with a sequence of powerful similarly huge aliens while trying not to do too much collateral damage to Tokyo in the process. In the original, I gather he was an unhappy boy. In this story, he’s a relatively well-adjusted boy who hallucinates conversations with his Warhammer figurines. I don’t see why I should care about this scenario or any similar scenarios, but maybe I’m missing something.
Can someone who read this or watched the original say something interesting that happens in it? Wikipedia mentions profound philosophical questions about the nature of reality, but it also mentions that the ending is widely regarded as incomprehensible. The quote about how every possible statement sounds profound if you get the rhetoric right seems to apply here. I don’t want to invest multiple hours to end up reading (or watching) some pseudo-profound nonsense.
Your strength as a rationalist is your ability to be more confused by fiction than by reality.
Does that lead to the conclusion that Newcomb’s problem is irrelevant? Mind-reading aliens are pretty clearly fiction. Anyone who says otherwise is much more likely to be schizophrenic than to have actual information about mind-reading aliens.
A person’s behavior can always be understood as optimizing a utility function, it just that if they are irrational (as in the Allais paradox) the utility functions start to look ridiculously complex. If all else fails, a utility function can be used that has a strong dependency on time in whatever way is required to match the observed behavior of the subject. “The subject had a strong preference for sneezing at 3:15:03pm October 8, 2011.”
From the point of view of someone who wants to get FAI to work, the important question is, if the FAI does obey the axioms required by utility theory, and you don’t obey those axioms for any simple utility function, are you better off if:
the FAI ascribes to you some mixture of possible complex utility functions and helps you to achieve that, or
the FAI uses a better explanation of your behavior, perhaps one of those alternative theories listed in the wikipedia article, and helps you to achieve some component of that explanation?
I don’t understand the alternative theories well enough to know if the latter option even makes sense.