Dagon comments on Alignment via prosocial brain algorithms

Dagon 13 Sep 2022 0:25 UTC
2 points
0
I don’t know of anyone advocating using children or chimpanzees as AI supervisors or trainers. The gap from evolved/early-learning behaviors to the “hard part” of human alignment is pretty massive.
I don’t have any better ideas than human-in-the-loop—I’m somewhat pessimistic about it’s effectiveness if AI significantly surpasses the humans in prediction/optimization power, but it’s certainly worth including in the research agenda.
- Gunnar_Zarncke 13 Sep 2022 20:02 UTC
  8 points
  1
  Parent
  I don’t know of anyone advocating using children or chimpanzees as AI supervisors or trainers.
  I think you are talking past each other. The argument is not that children would be a good choice for AI trainers. The argument is that children (and chimpanzees) show pro-social behavior. You don’t have to train chimps and children for 30 years until they figure out social behavior.
  Yes, if you want to replace competent humans as trainers then yes, but having an AI that cares about humans would be a nice achievement too.
  - Dagon 13 Sep 2022 20:28 UTC
    2 points
    0
    Parent
    I think it’s a relevant point. Children and chimps show some kinds of behavior we classify as prosocial, fine. But that’s a motte which does NOT defend the bailey of “human-in-the-loop” necessary because only evolution can generate these behaviors, OR that HITL is sufficient (or even useful) because we only need the kind of prosociality that seems close to universal.