“Just think about your gut-level reaction to hearing “EY wants to implement CEV for only SIAI volunteers and donors” and to “EY wants to implement CEV for all of humanity.”″
The first actually sounds better to me. I am fairly certain most SIAI-involved people are well-meaning, or at very least would not choose to cause J Random Stranger any harm if they could help it. I’m not so certain about ‘all of humanity’.
The relevant comparison isn’t what ‘all of humanity’ would choose, but rather what all of humanity would choose once CEV is done with their preferences.
This has been a source of confusion to me about the theory since I first encountered it, actually.
Given that this hypothetical CEV-extracting process gets results that aren’t necessarily anything that any individual actually wants, how do we tell the difference between an actual CEV-extracting process and something that was intended as a CEV-extracting process but that, due to a couple of subtle bugs in its code, is actually producing something other than its target’s CEV?
Is the idea that humanity’s actual CEV is something that, although we can’t necessarily come up with it ourselves, is so obviously the right answer once it’s pointed out to us that we’ll all nod our heads and go “Of course!” in unison?
Or is there some other testable property that only HACEV has? What property, and how do we test for it?
Because without such a testable property, I really don’t see why we believe flipping the switch on the AI that instantiates it is at all safe.
I have visions of someone perusing the resulting CEV assembled by the seed AI and going “Um… wait. If I’m understanding this correctly, the AI you instantiate to implement CEV will cause us all to walk around with watermelons on their feet.”
“Yes,” replies the seed AI, “that’s correct. It appears that humans really would want that, given enough time to think together about their footwear preferences.”
“Oh… well, OK,” says the peruser. “If you say so...”
In light of some later comment-threads on related subjects, and in the absence of any direct explanations, I tentatively (20-40% confidence) conclude that the attitude is that the process that generates the code that extracts the CEV that implements the FAI has to be perfect, in order to ensure that the FAI is perfect, which is important because even an epsilon deviation from perfection multiplied by the potential utility of a perfect FAI represents a huge disutility that might leave us vomiting happily on the sands of Mars.
And since testing is not a reliable process for achieving perfection, merely for reducing defects to epsilon, it seems to follow that testing simply isn’t relevant. We don’t test the CEV-generator, by this view; rather we develop it in such a way that we know it’s correct.
And once we’ve done that, we should be more willing to trust the CEV-generator’s view of what we really want than our own view (which is demonstrably unreliable).
So if it turns out to involve wearing watermelons on our feet (or living gender-segregated lives on different planets, or whatever it turns out to be) we should accept that that really is our extrapolated volition, and be grateful, even if our immediate emotional reaction is confusion, disgust, or dismay.
I hasten to add that I’m not supporting this view, just trying to understand it.
Given the choice between (apparently benevolent people’s volition) + (unpredictable factor) or (all people’s volition) + (random factor) I’d choose the former every time.
“Just think about your gut-level reaction to hearing “EY wants to implement CEV for only SIAI volunteers and donors” and to “EY wants to implement CEV for all of humanity.”″
The first actually sounds better to me. I am fairly certain most SIAI-involved people are well-meaning, or at very least would not choose to cause J Random Stranger any harm if they could help it. I’m not so certain about ‘all of humanity’.
The relevant comparison isn’t what ‘all of humanity’ would choose, but rather what all of humanity would choose once CEV is done with their preferences.
This has been a source of confusion to me about the theory since I first encountered it, actually.
Given that this hypothetical CEV-extracting process gets results that aren’t necessarily anything that any individual actually wants, how do we tell the difference between an actual CEV-extracting process and something that was intended as a CEV-extracting process but that, due to a couple of subtle bugs in its code, is actually producing something other than its target’s CEV?
Is the idea that humanity’s actual CEV is something that, although we can’t necessarily come up with it ourselves, is so obviously the right answer once it’s pointed out to us that we’ll all nod our heads and go “Of course!” in unison?
Or is there some other testable property that only HACEV has? What property, and how do we test for it?
Because without such a testable property, I really don’t see why we believe flipping the switch on the AI that instantiates it is at all safe.
I have visions of someone perusing the resulting CEV assembled by the seed AI and going “Um… wait. If I’m understanding this correctly, the AI you instantiate to implement CEV will cause us all to walk around with watermelons on their feet.”
“Yes,” replies the seed AI, “that’s correct. It appears that humans really would want that, given enough time to think together about their footwear preferences.”
“Oh… well, OK,” says the peruser. “If you say so...”
Surely I’m missing something?
In light of some later comment-threads on related subjects, and in the absence of any direct explanations, I tentatively (20-40% confidence) conclude that the attitude is that the process that generates the code that extracts the CEV that implements the FAI has to be perfect, in order to ensure that the FAI is perfect, which is important because even an epsilon deviation from perfection multiplied by the potential utility of a perfect FAI represents a huge disutility that might leave us vomiting happily on the sands of Mars.
And since testing is not a reliable process for achieving perfection, merely for reducing defects to epsilon, it seems to follow that testing simply isn’t relevant. We don’t test the CEV-generator, by this view; rather we develop it in such a way that we know it’s correct.
And once we’ve done that, we should be more willing to trust the CEV-generator’s view of what we really want than our own view (which is demonstrably unreliable).
So if it turns out to involve wearing watermelons on our feet (or living gender-segregated lives on different planets, or whatever it turns out to be) we should accept that that really is our extrapolated volition, and be grateful, even if our immediate emotional reaction is confusion, disgust, or dismay.
I hasten to add that I’m not supporting this view, just trying to understand it.
Given the choice between (apparently benevolent people’s volition) + (unpredictable factor) or (all people’s volition) + (random factor) I’d choose the former every time.
Extrapolating volition doesn’t make agree with mine.
eh?