(nods) I haven’t read the CAV paper and am very much not defending it. You just sounded knowledgeable about CEV so I figured I’d take the opportunity to ask an unrelated question that’s been bugging me pretty much since I got here.
Agreed that there are lots of implementation details to work out either way (to say the least), but this seems like more than an implementation detail.
The whole idea underlying the FAI-through-CEV enterprise is that providing a seed AI with direct access to human minds is a better source for a specification of the values we want an AI to optimize for than, say, explicitly trying to develop an ideal ethical theory.
And the argument I’ve seen in favor of that idea has mostly focused on the many ways that a formalized explicit ethical theory can go horribly horribly wrong, which is entirely fair.
But it’s absurd to say “A won’t work, let’s do B” if B will fail just as badly as A does. If pointing to human minds and saying “go!” doesn’t reliably exclude values that, if optimized for, go horribly horribly wrong, then it really does seem like a fundamentally different third option is needed.
That’s not just an implementation detail, and it’s not just a place where it’s possible to get something wrong.
(nods) I haven’t read the CAV paper and am very much not defending it. You just sounded knowledgeable about CEV so I figured I’d take the opportunity to ask an unrelated question that’s been bugging me pretty much since I got here.
Agreed that there are lots of implementation details to work out either way (to say the least), but this seems like more than an implementation detail.
The whole idea underlying the FAI-through-CEV enterprise is that providing a seed AI with direct access to human minds is a better source for a specification of the values we want an AI to optimize for than, say, explicitly trying to develop an ideal ethical theory.
And the argument I’ve seen in favor of that idea has mostly focused on the many ways that a formalized explicit ethical theory can go horribly horribly wrong, which is entirely fair.
But it’s absurd to say “A won’t work, let’s do B” if B will fail just as badly as A does. If pointing to human minds and saying “go!” doesn’t reliably exclude values that, if optimized for, go horribly horribly wrong, then it really does seem like a fundamentally different third option is needed.
That’s not just an implementation detail, and it’s not just a place where it’s possible to get something wrong.