Ah, I see what you mean and that I made a mistake—I didn’t understand how your post was about human mothers being aligned with their children, not just with evolution.
To some extent I think my comment makes sense as a reply, because trying to optimize[1] a black-box optimizer for fitness of a “simulated child” is still going to end up with the “mother” executing kludgy strategies, rather than recapitulating evolution to arrive at human-like values.
EDIT: Of course my misunderstanding makes most my attempt to psychologize you totally false.
But my comment also kinda doesn’t make sense, because since I didn’t understand your post I somewhat-glaringly don’t mention other key considerations. For example: mothers who love their children still want other things too, so how are we picking out what parts of their desires are “love for children”? Doing this requires an abstract model of the world, and that abstract model might “cheat” a little by treating love as a simple thing that corresponds to optimizing for the child’s own values, even if it’s messy and human.
A related pitfall is if you’re training an AI to take care of a simulated child, thinking about this process using the abstract model we use to think about mothers loving their children will treat “love” as a simple concept that the AI might hit upon all at once. But that intuitive abstract model will not treat ruthlessly exploiting the simulate child’s programming to get a high score by pushing it outside of its intended context as something simple, even though that might happen.
Ah, I see what you mean and that I made a mistake—I didn’t understand how your post was about human mothers being aligned with their children, not just with evolution.
To some extent I think my comment makes sense as a reply, because trying to optimize[1] a black-box optimizer for fitness of a “simulated child” is still going to end up with the “mother” executing kludgy strategies, rather than recapitulating evolution to arrive at human-like values.
EDIT: Of course my misunderstanding makes most my attempt to psychologize you totally false.
But my comment also kinda doesn’t make sense, because since I didn’t understand your post I somewhat-glaringly don’t mention other key considerations. For example: mothers who love their children still want other things too, so how are we picking out what parts of their desires are “love for children”? Doing this requires an abstract model of the world, and that abstract model might “cheat” a little by treating love as a simple thing that corresponds to optimizing for the child’s own values, even if it’s messy and human.
A related pitfall is if you’re training an AI to take care of a simulated child, thinking about this process using the abstract model we use to think about mothers loving their children will treat “love” as a simple concept that the AI might hit upon all at once. But that intuitive abstract model will not treat ruthlessly exploiting the simulate child’s programming to get a high score by pushing it outside of its intended context as something simple, even though that might happen.
especially with evolution, but also with gradient descent