I’m going to say something critical. I mean it earnestly on the object level, but bear you no ill will on the social or interpersonal level. In fact, I think that making this post is a positive sign for your future :)
A mental mistake has been made here, and I think you’re not alone in making it. We humans valorize a mother’s love for her children. We humans think that it generalizing to new situations is right and proper. So at first glance it might seem like evolution has miraculously produced something “robustly aligned” in the good generalization properties of a mother’s love for her children.
But evolution does not care about motherly love, it only cares about fitness. If a child loses their gonads at age 2, evolution would rather (to the extent that it would rather anything) the mother stop devoting resources to that child and have a new one.
Evolution was just promoting fitness, motherly love is a great result for us humans who think motherly love is great, but to evolution it’s just another suboptimal kludge. See the Tragedy of Group Selectionism. The rightness-according-to-humans is bleeding over and affecting your judgment about rightness-according-to-evolution.
All of this is to say: the alignment problem is as hard as it ever was, because motherly love is not a triumph of evolution aligning humans. It’s something we think is good, and we think generalizes in good ways, because we are talking about ourselves, our own values. The baby-eater aliens would praise evolution for so robustly aligning them to eat babies, and the puddle would praise the rainstorm for dropping it in a hole so suited for its shape. None of this is evidence that the optimization process that produced them is good at aligning things.
I recently articulated similar ideas about motherly love. I don’t think it’s an example of successful alignment because evolution’s goals are aligned with the mother’s goals. In the example you give where a child loses their gonads at age 2, it would be an alignment failure if the mother continues devoting resources to the child. In reality that wouldn’t happen, because with motherly love, evolution created an imperfect intermediate goal that is generally but not always the same as the goal of spreading your genes.
I totally agree that motherly love is not a triumph of evolution aligning humans with its goals. But I think it’s a good example of robust alignment between the mother’s actions and the child’s interests that generalizes well to OOD environments.
Thank you for your detailed feedback. I agree that evolution doesn’t care about anything, but i think that baby-eater aliens would not think that way. They can probably think about evolution aligning them to eat babies, but in their case it is an alignment of their values to them, not to any other agent/entity.
In our story we somehow care about somebody else, and it is their story that ends up with the “happy end”. I also agree that probably given enough time we will end up stop caring about babies who we think can not reproduce anymore, but it will be a much more complex solution.
At the first step it is probably much easier to just “make an animal who cares about it babies no matter what”, otherwise you will have to count on ability of that animal to recognize something it might not even understand (like reproductive abilities of a baby)
Ah, I see what you mean and that I made a mistake—I didn’t understand how your post was about human mothers being aligned with their children, not just with evolution.
To some extent I think my comment makes sense as a reply, because trying to optimize[1] a black-box optimizer for fitness of a “simulated child” is still going to end up with the “mother” executing kludgy strategies, rather than recapitulating evolution to arrive at human-like values.
EDIT: Of course my misunderstanding makes most my attempt to psychologize you totally false.
But my comment also kinda doesn’t make sense, because since I didn’t understand your post I somewhat-glaringly don’t mention other key considerations. For example: mothers who love their children still want other things too, so how are we picking out what parts of their desires are “love for children”? Doing this requires an abstract model of the world, and that abstract model might “cheat” a little by treating love as a simple thing that corresponds to optimizing for the child’s own values, even if it’s messy and human.
A related pitfall is if you’re training an AI to take care of a simulated child, thinking about this process using the abstract model we use to think about mothers loving their children will treat “love” as a simple concept that the AI might hit upon all at once. But that intuitive abstract model will not treat ruthlessly exploiting the simulate child’s programming to get a high score by pushing it outside of its intended context as something simple, even though that might happen.
I’m going to say something critical. I mean it earnestly on the object level, but bear you no ill will on the social or interpersonal level. In fact, I think that making this post is a positive sign for your future :)
A mental mistake has been made here, and I think you’re not alone in making it. We humans valorize a mother’s love for her children. We humans think that it generalizing to new situations is right and proper. So at first glance it might seem like evolution has miraculously produced something “robustly aligned” in the good generalization properties of a mother’s love for her children.
But evolution does not care about motherly love, it only cares about fitness. If a child loses their gonads at age 2, evolution would rather (to the extent that it would rather anything) the mother stop devoting resources to that child and have a new one.
Evolution was just promoting fitness, motherly love is a great result for us humans who think motherly love is great, but to evolution it’s just another suboptimal kludge. See the Tragedy of Group Selectionism. The rightness-according-to-humans is bleeding over and affecting your judgment about rightness-according-to-evolution.
All of this is to say: the alignment problem is as hard as it ever was, because motherly love is not a triumph of evolution aligning humans. It’s something we think is good, and we think generalizes in good ways, because we are talking about ourselves, our own values. The baby-eater aliens would praise evolution for so robustly aligning them to eat babies, and the puddle would praise the rainstorm for dropping it in a hole so suited for its shape. None of this is evidence that the optimization process that produced them is good at aligning things.
I recently articulated similar ideas about motherly love. I don’t think it’s an example of successful alignment because evolution’s goals are aligned with the mother’s goals. In the example you give where a child loses their gonads at age 2, it would be an alignment failure if the mother continues devoting resources to the child. In reality that wouldn’t happen, because with motherly love, evolution created an imperfect intermediate goal that is generally but not always the same as the goal of spreading your genes.
I totally agree that motherly love is not a triumph of evolution aligning humans with its goals. But I think it’s a good example of robust alignment between the mother’s actions and the child’s interests that generalizes well to OOD environments.
Thank you for your detailed feedback. I agree that evolution doesn’t care about anything, but i think that baby-eater aliens would not think that way. They can probably think about evolution aligning them to eat babies, but in their case it is an alignment of their values to them, not to any other agent/entity.
In our story we somehow care about somebody else, and it is their story that ends up with the “happy end”. I also agree that probably given enough time we will end up stop caring about babies who we think can not reproduce anymore, but it will be a much more complex solution.
At the first step it is probably much easier to just “make an animal who cares about it babies no matter what”, otherwise you will have to count on ability of that animal to recognize something it might not even understand (like reproductive abilities of a baby)
Ah, I see what you mean and that I made a mistake—I didn’t understand how your post was about human mothers being aligned with their children, not just with evolution.
To some extent I think my comment makes sense as a reply, because trying to optimize[1] a black-box optimizer for fitness of a “simulated child” is still going to end up with the “mother” executing kludgy strategies, rather than recapitulating evolution to arrive at human-like values.
EDIT: Of course my misunderstanding makes most my attempt to psychologize you totally false.
But my comment also kinda doesn’t make sense, because since I didn’t understand your post I somewhat-glaringly don’t mention other key considerations. For example: mothers who love their children still want other things too, so how are we picking out what parts of their desires are “love for children”? Doing this requires an abstract model of the world, and that abstract model might “cheat” a little by treating love as a simple thing that corresponds to optimizing for the child’s own values, even if it’s messy and human.
A related pitfall is if you’re training an AI to take care of a simulated child, thinking about this process using the abstract model we use to think about mothers loving their children will treat “love” as a simple concept that the AI might hit upon all at once. But that intuitive abstract model will not treat ruthlessly exploiting the simulate child’s programming to get a high score by pushing it outside of its intended context as something simple, even though that might happen.
especially with evolution, but also with gradient descent