We decide what loss functions to train the AIs with. It’s not like the AIs have some inbuilt reward circuitry specified by evolution to maximize the AI’s reproductive fitness. We can simply choose to reinforce cooperative behavior.
I think this leads to a massive power disparity (in our favor) between us and the AIs. Someone with total control over your own reward circuitry would have a massive advantage over you.
Maybe a nitpick, but ideally the reinforcement shouldn’t just be based on “behavior”; you want to reward the agent when it does the right thing for the right reasons. Right? (Or maybe you’re defining “cooperative behavior” as not only external behavior but also underlying motivations?)
We decide what loss functions to train the AIs with. It’s not like the AIs have some inbuilt reward circuitry specified by evolution to maximize the AI’s reproductive fitness. We can simply choose to reinforce cooperative behavior.
I think this leads to a massive power disparity (in our favor) between us and the AIs. Someone with total control over your own reward circuitry would have a massive advantage over you.
Maybe a nitpick, but ideally the reinforcement shouldn’t just be based on “behavior”; you want to reward the agent when it does the right thing for the right reasons. Right? (Or maybe you’re defining “cooperative behavior” as not only external behavior but also underlying motivations?)