Is there a pointer available to the evidence that an “extrapolation” process a la CEV actually addresses this problem?
I think there’s some uncertainty about that, actually. The extrapolation procedure is never really specified in CEV, and I could imagine some extrapolation procedures which probably do eliminate the death-to-outgroups volition, and some extrapolation procedures which don’t. So an actual implementation would have a lot of details to fill in, and there are ways of filling in those details which would be bad (but this is true of everything about AI, really).
Underspecification is a problem with CEV, and Goertzel’s CAV paper rightly complains about it. The problem is where he proposes to stick an identity function into the extrapolation procedure slot, which is one of the procedures that fails.
(nods) I haven’t read the CAV paper and am very much not defending it. You just sounded knowledgeable about CEV so I figured I’d take the opportunity to ask an unrelated question that’s been bugging me pretty much since I got here.
Agreed that there are lots of implementation details to work out either way (to say the least), but this seems like more than an implementation detail.
The whole idea underlying the FAI-through-CEV enterprise is that providing a seed AI with direct access to human minds is a better source for a specification of the values we want an AI to optimize for than, say, explicitly trying to develop an ideal ethical theory.
And the argument I’ve seen in favor of that idea has mostly focused on the many ways that a formalized explicit ethical theory can go horribly horribly wrong, which is entirely fair.
But it’s absurd to say “A won’t work, let’s do B” if B will fail just as badly as A does. If pointing to human minds and saying “go!” doesn’t reliably exclude values that, if optimized for, go horribly horribly wrong, then it really does seem like a fundamentally different third option is needed.
That’s not just an implementation detail, and it’s not just a place where it’s possible to get something wrong.
I think there’s some uncertainty about that, actually. The extrapolation procedure is never really specified in CEV, and I could imagine some extrapolation procedures which probably do eliminate the death-to-outgroups volition, and some extrapolation procedures which don’t. So an actual implementation would have a lot of details to fill in, and there are ways of filling in those details which would be bad (but this is true of everything about AI, really).
Underspecification is a problem with CEV, and Goertzel’s CAV paper rightly complains about it. The problem is where he proposes to stick an identity function into the extrapolation procedure slot, which is one of the procedures that fails.
(nods) I haven’t read the CAV paper and am very much not defending it. You just sounded knowledgeable about CEV so I figured I’d take the opportunity to ask an unrelated question that’s been bugging me pretty much since I got here.
Agreed that there are lots of implementation details to work out either way (to say the least), but this seems like more than an implementation detail.
The whole idea underlying the FAI-through-CEV enterprise is that providing a seed AI with direct access to human minds is a better source for a specification of the values we want an AI to optimize for than, say, explicitly trying to develop an ideal ethical theory.
And the argument I’ve seen in favor of that idea has mostly focused on the many ways that a formalized explicit ethical theory can go horribly horribly wrong, which is entirely fair.
But it’s absurd to say “A won’t work, let’s do B” if B will fail just as badly as A does. If pointing to human minds and saying “go!” doesn’t reliably exclude values that, if optimized for, go horribly horribly wrong, then it really does seem like a fundamentally different third option is needed.
That’s not just an implementation detail, and it’s not just a place where it’s possible to get something wrong.