Quintin Pope comments on The shard theory of human values

Quintin Pope 5 Sep 2022 20:33 UTC
10 points
9
We decide what loss functions to train the AIs with. It’s not like the AIs have some inbuilt reward circuitry specified by evolution to maximize the AI’s reproductive fitness. We can simply choose to reinforce cooperative behavior.
I think this leads to a massive power disparity (in our favor) between us and the AIs. Someone with total control over your own reward circuitry would have a massive advantage over you.
- Steven Byrnes 6 Sep 2022 20:43 UTC
  6 points
  3
  Parent
  Maybe a nitpick, but ideally the reinforcement shouldn’t just be based on “behavior”; you want to reward the agent when it does the right thing for the right reasons. Right? (Or maybe you’re defining “cooperative behavior” as not only external behavior but also underlying motivations?)