I think perhaps a lot work is being done by “if your optimiser worked”. This might also be where there’s a disanaology between humans<->evolution and AIs<->SGD+PPO (or whatever RL algorithm you’re using to optimise the policy). Maybe evolution is actually a very weak optimiser, that doesn’t really “work”, compared to SGD+RL.
I think perhaps a lot work is being done by “if your optimiser worked”. This might also be where there’s a disanaology between humans<->evolution and AIs<->SGD+PPO (or whatever RL algorithm you’re using to optimise the policy). Maybe evolution is actually a very weak optimiser, that doesn’t really “work”, compared to SGD+RL.
I think that evolution is not the relevant optimizer for humans in this situation. Instead consider the within-lifetime learning that goes on in human brains. Humans are very probably reinforcement learning agents in a relevant sense; in some ways, humans are the best reinforcement learning agents we have ever seen.