Why does FAI have to have a utility function that’s such a close approximation of the human utility function? Let’s say we develop awesome natural language processing technology, and the AI can read the internet and actually know what we mean when we say “OK AI, promote human flourishing” and ask us questions on ambiguous points and whatnot. Why doesn’t this work? There are probably humans I would vote in to all-powerful benevolent dictator positions, so I’m not sure my threshold for what I’d accept as an all-powerful benevolent dictator is all that high.
You have two questions: why accurately approximate human value, and why not have it just ask us about ambiguities.
Because the hard part is getting it to do anything coherent at all, and once we are there, it is little extra work to make it do what we really want.
This would work. The hard part is to get it to do that.
I would also accept most people as BDFL, over the incumbent gods of indifferent chaos. Again the hard part is kicking out the incumbent. Past that point the debate is basically what color to paint the walls, by comparison.
I was modelling it as a superintelligence acting on eg GWB’s behalf, including doing his moral philosophy (ie GWB’s Extrapolated Volition). I see I wasn’t exactly obvious with that assumption.
Let’s put it this way, conditional on the BDFL doing well by their own standards (so, not the usual human fail), I would probably find that world superior to this.
The only wrench to be thrown in this is human corruption by power, but then it’s debatable whether the BDFL is doing well by their own (previous) standards.
I broadly agree with this. GWB probably thinks of eg minimising gay sex as a terminal value, but I would have thought that a superintelligence extrapolating GWBEV would figure out that value was conditional on their being a God, which there isn’t, and discard it.
Well, if we ask it to, say, maximize human happiness or “complexity” or virtue or GDP or any of a million other things … BAM the world sucks and we probably can’t fix it.
What if I say “maximize x for just a little while, then talk to me for further instructions”? A human can understand that without difficulty, so for a superintelligent AI it should be easy right?
I think it depends on how you mean “a little while”, but it’s quite possible the world would now contain safeguards against further changes, or simply no longer contain you (or a version of “you” that shares your goals.)
(Also, millennia of subjective torture (or whatever) might be a high price for the experiment, even if it got reset.)
“ambiguity” is a continuous parameter. Your sentence doesn’t have enough ambiguousness for it to pass the threshold after which I would refer to it as “ambigious”, but it probably doesn’t mean exactly the same thing to me as it does to you.
Given sufficiently good language-understanding, world-modeling, and human-thought-intuiting algorithms (human-thought-intuiting perhaps to a large degree being implied by language-understanding and/or world-modeling), it seems like an AGI could interpret your sentence as well as I do if not better. You could configure it with some ambiguousness threshold beyond which it would ask for clarification.
Why does FAI have to have a utility function that’s such a close approximation of the human utility function? Let’s say we develop awesome natural language processing technology, and the AI can read the internet and actually know what we mean when we say “OK AI, promote human flourishing” and ask us questions on ambiguous points and whatnot. Why doesn’t this work? There are probably humans I would vote in to all-powerful benevolent dictator positions, so I’m not sure my threshold for what I’d accept as an all-powerful benevolent dictator is all that high.
You have two questions: why accurately approximate human value, and why not have it just ask us about ambiguities.
Because the hard part is getting it to do anything coherent at all, and once we are there, it is little extra work to make it do what we really want.
This would work. The hard part is to get it to do that.
I would also accept most people as BDFL, over the incumbent gods of indifferent chaos. Again the hard part is kicking out the incumbent. Past that point the debate is basically what color to paint the walls, by comparison.
Not sure I would. Azathoth doesn’t fight back if you try to overthrow it and set up Belldandy in Its place. George W. Bush would.
I was modelling it as a superintelligence acting on eg GWB’s behalf, including doing his moral philosophy (ie GWB’s Extrapolated Volition). I see I wasn’t exactly obvious with that assumption.
Let’s put it this way, conditional on the BDFL doing well by their own standards (so, not the usual human fail), I would probably find that world superior to this.
The only wrench to be thrown in this is human corruption by power, but then it’s debatable whether the BDFL is doing well by their own (previous) standards.
I broadly agree with this. GWB probably thinks of eg minimising gay sex as a terminal value, but I would have thought that a superintelligence extrapolating GWBEV would figure out that value was conditional on their being a God, which there isn’t, and discard it.
Well, if we ask it to, say, maximize human happiness or “complexity” or virtue or GDP or any of a million other things … BAM the world sucks and we probably can’t fix it.
What if I say “maximize x for just a little while, then talk to me for further instructions”? A human can understand that without difficulty, so for a superintelligent AI it should be easy right?
I think it depends on how you mean “a little while”, but it’s quite possible the world would now contain safeguards against further changes, or simply no longer contain you (or a version of “you” that shares your goals.)
(Also, millennia of subjective torture (or whatever) might be a high price for the experiment, even if it got reset.)
Everything is ambiguous and this would slow it down too much.
“ambiguity” is a continuous parameter. Your sentence doesn’t have enough ambiguousness for it to pass the threshold after which I would refer to it as “ambigious”, but it probably doesn’t mean exactly the same thing to me as it does to you.
Given sufficiently good language-understanding, world-modeling, and human-thought-intuiting algorithms (human-thought-intuiting perhaps to a large degree being implied by language-understanding and/or world-modeling), it seems like an AGI could interpret your sentence as well as I do if not better. You could configure it with some ambiguousness threshold beyond which it would ask for clarification.