Though, of course, there’s no guarantee that individual humans won’t all be completely horrified by the prospect.
Recall that after CEV extrapolates current humans’ volitions and construes a coherent superposition, the next step isn’t “do everything that superposition says”, but rather, “ask that superposition the one question ‘Given the world as it is right now, what program should we run next?’, run that program, and then shut down”. I suppose it’s possible that our CEV will produce an AI that immediately does something we find horrifying, but I think our future selves are nicer than that… or could be nicer than that, if extrapolated the right way, so I’d consider it a failure of Friendliness if we get a “do something we’d currently find horrifying for the greater good” AI if a different extrapolation strategy would have resulted in something like a “start with the most agreeable and urgent stuff, and other than that, protect us while we grow up and give us help where we need it” AI.
I really doubt that we’d need an AI to do anything immediately horrifying to the human species in order to allow it to grow up into an awesome fun posthuman civilization, so if CEV 1.0 Beta 1 appeared to be going in that direction, that would probably be considered a bug and fixed.
(shrug) Sure, if you’re right that the “most urgent and agreeable stuff” doesn’t happen to press a significant number of people’s emotional buttons, then it follows that not many people’s emotional buttons will be pressed.
But there’s a big difference between assuming that this will be the case, and considering it a bug if it isn’t.
Either I trust the process we build more than I trust my personal judgments, or I don’t.
If I don’t, then why go through this whole rigamarole in the first place? I should prefer to implement my personal judgments. (Of course, I may not have the power to do so, and prefer to join more powerful coalitions whose judgments are close-enough to mine. But in that case CEV becomes a mere political compromise among the powerful.)
If I do, then it’s not clear to me that “fixing the bug” is a good idea.
That is, OK, suppose we write a seed AI intended to work out humanity’s collective CEV, work out some next-step goals based on that CEV and an understanding of likely consequences, construct a program P to implement those goals, run P, and quit.
Suppose that I am personally horrified by the results of running P. Ought I choose to abort P? Or ought I say to myself “Oh, how interesting: my near-mode emotional reactions to the implications of what humanity really wants are extremely negative. Still, most everybody else seems OK with it. OK, fine: this is not going to be a pleasant transition period for me, but my best guess is still that it will ultimately be for the best.”
Is there some number of people such that if more than that many people are horrified by the results, we ought to choose to abort P?
Does the question even matter? The process as you’ve described it doesn’t include an abort mechanism; whichever choice we make P is executed.
Ought we include such an abort mechanism? It’s not at all clear to me that we should. I can get on a roller-coaster or choose not to get on it, but giving me a brake pedal on a roller coaster is kind of ridiculous.
Recall that after CEV extrapolates current humans’ volitions and construes a coherent superposition, the next step isn’t “do everything that superposition says”, but rather, “ask that superposition the one question ‘Given the world as it is right now, what program should we run next?’, run that program, and then shut down”. I suppose it’s possible that our CEV will produce an AI that immediately does something we find horrifying, but I think our future selves are nicer than that… or could be nicer than that, if extrapolated the right way, so I’d consider it a failure of Friendliness if we get a “do something we’d currently find horrifying for the greater good” AI if a different extrapolation strategy would have resulted in something like a “start with the most agreeable and urgent stuff, and other than that, protect us while we grow up and give us help where we need it” AI.
I really doubt that we’d need an AI to do anything immediately horrifying to the human species in order to allow it to grow up into an awesome fun posthuman civilization, so if CEV 1.0 Beta 1 appeared to be going in that direction, that would probably be considered a bug and fixed.
(shrug) Sure, if you’re right that the “most urgent and agreeable stuff” doesn’t happen to press a significant number of people’s emotional buttons, then it follows that not many people’s emotional buttons will be pressed.
But there’s a big difference between assuming that this will be the case, and considering it a bug if it isn’t.
Either I trust the process we build more than I trust my personal judgments, or I don’t.
If I don’t, then why go through this whole rigamarole in the first place? I should prefer to implement my personal judgments. (Of course, I may not have the power to do so, and prefer to join more powerful coalitions whose judgments are close-enough to mine. But in that case CEV becomes a mere political compromise among the powerful.)
If I do, then it’s not clear to me that “fixing the bug” is a good idea.
That is, OK, suppose we write a seed AI intended to work out humanity’s collective CEV, work out some next-step goals based on that CEV and an understanding of likely consequences, construct a program P to implement those goals, run P, and quit.
Suppose that I am personally horrified by the results of running P. Ought I choose to abort P? Or ought I say to myself “Oh, how interesting: my near-mode emotional reactions to the implications of what humanity really wants are extremely negative. Still, most everybody else seems OK with it. OK, fine: this is not going to be a pleasant transition period for me, but my best guess is still that it will ultimately be for the best.”
Is there some number of people such that if more than that many people are horrified by the results, we ought to choose to abort P?
Does the question even matter? The process as you’ve described it doesn’t include an abort mechanism; whichever choice we make P is executed.
Ought we include such an abort mechanism? It’s not at all clear to me that we should. I can get on a roller-coaster or choose not to get on it, but giving me a brake pedal on a roller coaster is kind of ridiculous.