Just noting the risk that the AIs could learn verifiable cooperation/coordination rather than kindness. This would probably be incentivized by the training (“you don’t profit from being nice to a cooperate-rock”), and could easily cut humans out of the trades that AI make with one another.
AIs could learn to cooperate with perfect selfishness, but humans and AIs usually learn easier to compute heuristics / “value shards” early in training, which persist to some extent after the agent discovers the true optimal policy, although reflection or continued training could stamp out the value shards later.
Just noting the risk that the AIs could learn verifiable cooperation/coordination rather than kindness. This would probably be incentivized by the training (“you don’t profit from being nice to a cooperate-rock”), and could easily cut humans out of the trades that AI make with one another.
AIs could learn to cooperate with perfect selfishness, but humans and AIs usually learn easier to compute heuristics / “value shards” early in training, which persist to some extent after the agent discovers the true optimal policy, although reflection or continued training could stamp out the value shards later.
maybe, but if the ai is playing a hard competitive game it will directly learn to be destructively ruthless