Humans don’t explicitly pursue inclusive genetic fitness; outer optimization even on a very exact, very simple loss function doesn’t produce inner optimization in that direction. This happens in practice in real life, it is what happened in the only case we know about,
I’m not very compelled by this, I think.
Evolution was doing very little (0) adversarial training: guessing ahead to to the sorts of circumstances under which humans would pursue strategies that didn’t result in maximizing inclusive genetic fitness, and testing the humans, and penalizing them for deviations from the outer loss function.
But that seems like a natural thing to do when training an AI system.
In short, evolution wasn’t trying very hard to align humans, so it doesn’t seem like much evidence that they ended up not very aligned.
I’m not very compelled by this, I think.
Evolution was doing very little (0) adversarial training: guessing ahead to to the sorts of circumstances under which humans would pursue strategies that didn’t result in maximizing inclusive genetic fitness, and testing the humans, and penalizing them for deviations from the outer loss function.
But that seems like a natural thing to do when training an AI system.
In short, evolution wasn’t trying very hard to align humans, so it doesn’t seem like much evidence that they ended up not very aligned.