If you think that people’s genes would be a lot fitter if people cared about fitness more then surely there’s a good chance that a more efficient version of natural selection would lead to people caring more about fitness.
You might, on the other hand, think that the problem is more related to feedbacks. I.e. if you’re the smartest monkey, you can spend your time scheming to have all the babies. If there are many smart monkeys, you have to spend a lot of time worrying about what the other monkeys think of you. If this is how you’re worried misalignment will arise, then I think “how do deep learning models generalise?” is the wrong tree to bark up
C. If people did care about fitness, would Yudkowsky not say “instrumental convergence! Reward hacking!”? I’d even be inclined to grant he had a point.
Two observations:
If you think that people’s genes would be a lot fitter if people cared about fitness more then surely there’s a good chance that a more efficient version of natural selection would lead to people caring more about fitness.
You might, on the other hand, think that the problem is more related to feedbacks. I.e. if you’re the smartest monkey, you can spend your time scheming to have all the babies. If there are many smart monkeys, you have to spend a lot of time worrying about what the other monkeys think of you. If this is how you’re worried misalignment will arise, then I think “how do deep learning models generalise?” is the wrong tree to bark up
C. If people did care about fitness, would Yudkowsky not say “instrumental convergence! Reward hacking!”? I’d even be inclined to grant he had a point.