Interesting! Definitely agree that if people’s specific social histories are largely what qualify them to be ‘in the loop,’ this would be hard to replicate for the reasons you bring up. However, consider that, for example,
which almost certainly has nothing to do with their social history. I think there’s a solid argument to be made, then, that a lot of these social histories are essentially a lifelong finetuning of core prosocial algorithms that have in some sense been there all along. And I am mainly excited about enumerating these. (Note also that figuring out these algorithms and running them in an RL training procedure might get us the relevant social histories training that you reference—but we’d need the core algorithms first.)
“human in the loop” to some extent translates to “we don’t actually know why we trust (some) other humans, but there exist humans we trust, so let’s delegate the hard part to them”.
I totally agree with this statement taken by itself, and my central point is that we should actually attempt to figure out ‘why we trust (some) other humans’ rather than treating this as a kind of black box. However, if this statement is being put forward as an argument against doing so,, then it seems circular to me.
I don’t know of anyone advocating using children or chimpanzees as AI supervisors or trainers. The gap from evolved/early-learning behaviors to the “hard part” of human alignment is pretty massive.
I don’t have any better ideas than human-in-the-loop—I’m somewhat pessimistic about it’s effectiveness if AI significantly surpasses the humans in prediction/optimization power, but it’s certainly worth including in the research agenda.
I don’t know of anyone advocating using children or chimpanzees as AI supervisors or trainers.
I think you are talking past each other. The argument is not that children would be a good choice for AI trainers. The argument is that children (and chimpanzees) show pro-social behavior. You don’t have to train chimps and children for 30 years until they figure out social behavior.
Yes, if you want to replace competent humans as trainers then yes, but having an AI that cares about humans would be a nice achievement too.
I think it’s a relevant point. Children and chimps show some kinds of behavior we classify as prosocial, fine. But that’s a motte which does NOT defend the bailey of “human-in-the-loop” necessary because only evolution can generate these behaviors, OR that HITL is sufficient (or even useful) because we only need the kind of prosociality that seems close to universal.
Interesting! Definitely agree that if people’s specific social histories are largely what qualify them to be ‘in the loop,’ this would be hard to replicate for the reasons you bring up. However, consider that, for example,
which almost certainly has nothing to do with their social history. I think there’s a solid argument to be made, then, that a lot of these social histories are essentially a lifelong finetuning of core prosocial algorithms that have in some sense been there all along. And I am mainly excited about enumerating these. (Note also that figuring out these algorithms and running them in an RL training procedure might get us the relevant social histories training that you reference—but we’d need the core algorithms first.)
I totally agree with this statement taken by itself, and my central point is that we should actually attempt to figure out ‘why we trust (some) other humans’ rather than treating this as a kind of black box. However, if this statement is being put forward as an argument against doing so,, then it seems circular to me.
I don’t know of anyone advocating using children or chimpanzees as AI supervisors or trainers. The gap from evolved/early-learning behaviors to the “hard part” of human alignment is pretty massive.
I don’t have any better ideas than human-in-the-loop—I’m somewhat pessimistic about it’s effectiveness if AI significantly surpasses the humans in prediction/optimization power, but it’s certainly worth including in the research agenda.
I think you are talking past each other. The argument is not that children would be a good choice for AI trainers. The argument is that children (and chimpanzees) show pro-social behavior. You don’t have to train chimps and children for 30 years until they figure out social behavior.
Yes, if you want to replace competent humans as trainers then yes, but having an AI that cares about humans would be a nice achievement too.
I think it’s a relevant point. Children and chimps show some kinds of behavior we classify as prosocial, fine. But that’s a motte which does NOT defend the bailey of “human-in-the-loop” necessary because only evolution can generate these behaviors, OR that HITL is sufficient (or even useful) because we only need the kind of prosociality that seems close to universal.