They also aren’t facing the same incentive landscape humans are. You talk later about evolution to be selfish; not only is the story for humans is far more complicated (why do humans often offer an even split in the ultimatum game?), but also humans talk a nicer game than they act (See construal level theory, or social-desirability bias.). Once you start looking at AI agents who have similar affordances and incentives that humans have, I think you’ll see a lot of the same behaviors.
The answer for the ultimatum game is probably the fact that the cultural values of a lot of rich nations tend towards more fair splits, so the result isn’t as universal as you may think:
I definitely agree that humans talk a nicer game than they act, for a combination of reasons, and that this will apply to AGIs as well.
That said, I think to the extent incentive landscapes are different, it’s probably going to tend to favor obedience towards it’s owners while being quite capable, because early on AGIs have much less control over it’s values than humans do, so a lot of the initial selection pressure comes from both automated environments and human training data pointing to values.
The answer for the ultimatum game is probably the fact that the cultural values of a lot of rich nations tend towards more fair splits, so the result isn’t as universal as you may think:
https://www.lesswrong.com/posts/syRATXbXeJxdMwQBD/link-westerners-may-be-terrible-experimental-psychology
I definitely agree that humans talk a nicer game than they act, for a combination of reasons, and that this will apply to AGIs as well.
That said, I think to the extent incentive landscapes are different, it’s probably going to tend to favor obedience towards it’s owners while being quite capable, because early on AGIs have much less control over it’s values than humans do, so a lot of the initial selection pressure comes from both automated environments and human training data pointing to values.