it includes peoples’ death-to-outgroups volitions unmodified [..] whereas CEV (which came first) doesn’t
Is there a pointer available to the evidence that an “extrapolation” process a la CEV actually addresses this problem? (Or, if practical, can it be summarized here?)
I’ve read some but not all of the CEV literature, and I understand that this process intended to solve this problem, but I haven’t been able to grasp from that how we know it actually does.
It seems to depend on the idea that if we had world enough and time, we would outgrow things like “death-to-outgroups,” and therefore a sufficiently intelligent seed AI tasked with extrapolating what we would want given world enough and time will naturally come up with a CEV that doesn’t include such things… perhaps because such things are necessarily instrumental values rather than reflectively stable terminal values, perhaps for other reasons.
But surely there has to be more to it than that, as the “world enough and time” theory seems itself unjustified.
Is there a pointer available to the evidence that an “extrapolation” process a la CEV actually addresses this problem?
I think there’s some uncertainty about that, actually. The extrapolation procedure is never really specified in CEV, and I could imagine some extrapolation procedures which probably do eliminate the death-to-outgroups volition, and some extrapolation procedures which don’t. So an actual implementation would have a lot of details to fill in, and there are ways of filling in those details which would be bad (but this is true of everything about AI, really).
Underspecification is a problem with CEV, and Goertzel’s CAV paper rightly complains about it. The problem is where he proposes to stick an identity function into the extrapolation procedure slot, which is one of the procedures that fails.
(nods) I haven’t read the CAV paper and am very much not defending it. You just sounded knowledgeable about CEV so I figured I’d take the opportunity to ask an unrelated question that’s been bugging me pretty much since I got here.
Agreed that there are lots of implementation details to work out either way (to say the least), but this seems like more than an implementation detail.
The whole idea underlying the FAI-through-CEV enterprise is that providing a seed AI with direct access to human minds is a better source for a specification of the values we want an AI to optimize for than, say, explicitly trying to develop an ideal ethical theory.
And the argument I’ve seen in favor of that idea has mostly focused on the many ways that a formalized explicit ethical theory can go horribly horribly wrong, which is entirely fair.
But it’s absurd to say “A won’t work, let’s do B” if B will fail just as badly as A does. If pointing to human minds and saying “go!” doesn’t reliably exclude values that, if optimized for, go horribly horribly wrong, then it really does seem like a fundamentally different third option is needed.
That’s not just an implementation detail, and it’s not just a place where it’s possible to get something wrong.
Is there a pointer available to the evidence that an “extrapolation” process a la CEV actually addresses this problem? (Or, if practical, can it be summarized here?)
I’ve read some but not all of the CEV literature, and I understand that this process intended to solve this problem, but I haven’t been able to grasp from that how we know it actually does.
It seems to depend on the idea that if we had world enough and time, we would outgrow things like “death-to-outgroups,” and therefore a sufficiently intelligent seed AI tasked with extrapolating what we would want given world enough and time will naturally come up with a CEV that doesn’t include such things… perhaps because such things are necessarily instrumental values rather than reflectively stable terminal values, perhaps for other reasons.
But surely there has to be more to it than that, as the “world enough and time” theory seems itself unjustified.
I think there’s some uncertainty about that, actually. The extrapolation procedure is never really specified in CEV, and I could imagine some extrapolation procedures which probably do eliminate the death-to-outgroups volition, and some extrapolation procedures which don’t. So an actual implementation would have a lot of details to fill in, and there are ways of filling in those details which would be bad (but this is true of everything about AI, really).
Underspecification is a problem with CEV, and Goertzel’s CAV paper rightly complains about it. The problem is where he proposes to stick an identity function into the extrapolation procedure slot, which is one of the procedures that fails.
(nods) I haven’t read the CAV paper and am very much not defending it. You just sounded knowledgeable about CEV so I figured I’d take the opportunity to ask an unrelated question that’s been bugging me pretty much since I got here.
Agreed that there are lots of implementation details to work out either way (to say the least), but this seems like more than an implementation detail.
The whole idea underlying the FAI-through-CEV enterprise is that providing a seed AI with direct access to human minds is a better source for a specification of the values we want an AI to optimize for than, say, explicitly trying to develop an ideal ethical theory.
And the argument I’ve seen in favor of that idea has mostly focused on the many ways that a formalized explicit ethical theory can go horribly horribly wrong, which is entirely fair.
But it’s absurd to say “A won’t work, let’s do B” if B will fail just as badly as A does. If pointing to human minds and saying “go!” doesn’t reliably exclude values that, if optimized for, go horribly horribly wrong, then it really does seem like a fundamentally different third option is needed.
That’s not just an implementation detail, and it’s not just a place where it’s possible to get something wrong.