I just took a look at Ben Goetzel’s CAV (Coherent Aggregated Volition). As far as I can tell, it includes peoples’ death-to-outgroups volitions unmodified and thereby destroys the world, whereas CEV (which came first) doesn’t. And he presents the desire to murder as an example and then fails to address it, then goes on to talk about running experiments on aggregating the volitions of trivial, non-human agents. That looks like a serious rationality failure in the direction of ignoring danger, and I get the same impression from his other writing, too.
The more of Ben Goertzel’s writing I read, the less comfortable I am with him controlling OpenCog. If OpenCog turns into a seed AI, I don’t think it’s safe for him to be the one making the launch/no-launch decision. I also don’t think it’s safe for him to be setting directions for the project before then, either.
it includes peoples’ death-to-outgroups volitions unmodified [..] whereas CEV (which came first) doesn’t
Is there a pointer available to the evidence that an “extrapolation” process a la CEV actually addresses this problem? (Or, if practical, can it be summarized here?)
I’ve read some but not all of the CEV literature, and I understand that this process intended to solve this problem, but I haven’t been able to grasp from that how we know it actually does.
It seems to depend on the idea that if we had world enough and time, we would outgrow things like “death-to-outgroups,” and therefore a sufficiently intelligent seed AI tasked with extrapolating what we would want given world enough and time will naturally come up with a CEV that doesn’t include such things… perhaps because such things are necessarily instrumental values rather than reflectively stable terminal values, perhaps for other reasons.
But surely there has to be more to it than that, as the “world enough and time” theory seems itself unjustified.
Is there a pointer available to the evidence that an “extrapolation” process a la CEV actually addresses this problem?
I think there’s some uncertainty about that, actually. The extrapolation procedure is never really specified in CEV, and I could imagine some extrapolation procedures which probably do eliminate the death-to-outgroups volition, and some extrapolation procedures which don’t. So an actual implementation would have a lot of details to fill in, and there are ways of filling in those details which would be bad (but this is true of everything about AI, really).
Underspecification is a problem with CEV, and Goertzel’s CAV paper rightly complains about it. The problem is where he proposes to stick an identity function into the extrapolation procedure slot, which is one of the procedures that fails.
(nods) I haven’t read the CAV paper and am very much not defending it. You just sounded knowledgeable about CEV so I figured I’d take the opportunity to ask an unrelated question that’s been bugging me pretty much since I got here.
Agreed that there are lots of implementation details to work out either way (to say the least), but this seems like more than an implementation detail.
The whole idea underlying the FAI-through-CEV enterprise is that providing a seed AI with direct access to human minds is a better source for a specification of the values we want an AI to optimize for than, say, explicitly trying to develop an ideal ethical theory.
And the argument I’ve seen in favor of that idea has mostly focused on the many ways that a formalized explicit ethical theory can go horribly horribly wrong, which is entirely fair.
But it’s absurd to say “A won’t work, let’s do B” if B will fail just as badly as A does. If pointing to human minds and saying “go!” doesn’t reliably exclude values that, if optimized for, go horribly horribly wrong, then it really does seem like a fundamentally different third option is needed.
That’s not just an implementation detail, and it’s not just a place where it’s possible to get something wrong.
Gut feeling: I’d probably sacrifice myself to create a superhuman artilect, but not my kids…. I do have huge ambitions and interests going way beyond the human race – but I’m still a human.
[...]
And the better an AGI theory we have, the more intelligently we’ll be able to bias the odds. But I doubt we’ll be able to get a good AGI theory via pure armchair theorizing. I think we’ll get there via an evolving combination of theory and experiment – experiment meaning, building and interacting with early-stage proto-AGI systems of various sorts.
experiment meaning, building and interacting with early-stage proto-AGI systems of various sorts.
I’m not very familiar with Goertzel’s ideas. Does he recognize the importance of not letting the proto-AGI systems self-improve while their values are uncertain?
From what I’ve gathered Ben thinks that these experiments will reveal that friendliness is impossible, that ‘be nice to humans’ is not a stable value. I’m not sure why he thinks this.
OpenCog is open source anyway: anything Goertzel can do can be done by anyone else. If Goertzel didn’t think it was safe to run, what’s stopping someone else from running it?
I just took a look at Ben Goetzel’s CAV (Coherent Aggregated Volition). As far as I can tell, it includes peoples’ death-to-outgroups volitions unmodified and thereby destroys the world, whereas CEV (which came first) doesn’t. And he presents the desire to murder as an example and then fails to address it, then goes on to talk about running experiments on aggregating the volitions of trivial, non-human agents. That looks like a serious rationality failure in the direction of ignoring danger, and I get the same impression from his other writing, too.
The more of Ben Goertzel’s writing I read, the less comfortable I am with him controlling OpenCog. If OpenCog turns into a seed AI, I don’t think it’s safe for him to be the one making the launch/no-launch decision. I also don’t think it’s safe for him to be setting directions for the project before then, either.
Is there a pointer available to the evidence that an “extrapolation” process a la CEV actually addresses this problem? (Or, if practical, can it be summarized here?)
I’ve read some but not all of the CEV literature, and I understand that this process intended to solve this problem, but I haven’t been able to grasp from that how we know it actually does.
It seems to depend on the idea that if we had world enough and time, we would outgrow things like “death-to-outgroups,” and therefore a sufficiently intelligent seed AI tasked with extrapolating what we would want given world enough and time will naturally come up with a CEV that doesn’t include such things… perhaps because such things are necessarily instrumental values rather than reflectively stable terminal values, perhaps for other reasons.
But surely there has to be more to it than that, as the “world enough and time” theory seems itself unjustified.
I think there’s some uncertainty about that, actually. The extrapolation procedure is never really specified in CEV, and I could imagine some extrapolation procedures which probably do eliminate the death-to-outgroups volition, and some extrapolation procedures which don’t. So an actual implementation would have a lot of details to fill in, and there are ways of filling in those details which would be bad (but this is true of everything about AI, really).
Underspecification is a problem with CEV, and Goertzel’s CAV paper rightly complains about it. The problem is where he proposes to stick an identity function into the extrapolation procedure slot, which is one of the procedures that fails.
(nods) I haven’t read the CAV paper and am very much not defending it. You just sounded knowledgeable about CEV so I figured I’d take the opportunity to ask an unrelated question that’s been bugging me pretty much since I got here.
Agreed that there are lots of implementation details to work out either way (to say the least), but this seems like more than an implementation detail.
The whole idea underlying the FAI-through-CEV enterprise is that providing a seed AI with direct access to human minds is a better source for a specification of the values we want an AI to optimize for than, say, explicitly trying to develop an ideal ethical theory.
And the argument I’ve seen in favor of that idea has mostly focused on the many ways that a formalized explicit ethical theory can go horribly horribly wrong, which is entirely fair.
But it’s absurd to say “A won’t work, let’s do B” if B will fail just as badly as A does. If pointing to human minds and saying “go!” doesn’t reliably exclude values that, if optimized for, go horribly horribly wrong, then it really does seem like a fundamentally different third option is needed.
That’s not just an implementation detail, and it’s not just a place where it’s possible to get something wrong.
Here is an interesting interview between Hugo de Garis and Ben Goertzel:
[...]
I’m not very familiar with Goertzel’s ideas. Does he recognize the importance of not letting the proto-AGI systems self-improve while their values are uncertain?
From what I’ve gathered Ben thinks that these experiments will reveal that friendliness is impossible, that ‘be nice to humans’ is not a stable value. I’m not sure why he thinks this.
OpenCog is open source anyway: anything Goertzel can do can be done by anyone else. If Goertzel didn’t think it was safe to run, what’s stopping someone else from running it?
Isn’t that even worse?