I guess we are converging. I’m just pointing out flaws in this option, but I also can’t give a better solution off the top of my head. At least this won’t insta-kill us, assuming that real-world humans count as non-copyable agents (how does that generalize again? Are you sure RL agents can just learn our definition of an agent correctly, and that won’t include stuff like ants?), and that they can’t get excessive virtual resources from our world without our cooperation (in that case a substantial amount of agents goes rogue, and some of them get punished, but some get through). I still think we can do better than this though, somehow.
(Also if with “ban all AI work” you’re referring to the open letter thing, that’s not really what it’s trying to do, but sure)
the reason it would have to avoid general harm is not the negative reward but rather the general bias for cooperation that applies to both copyable and non-copyable agents
How does non-harm follow from cooperation? If we remove the “negative reward for killing” part, what stops them from randomly killing agents (and everyone else believing it’s okay, so no punishment), if there is still enough other agents to cooperate with? Grudges? How do they work exactly for harm other than killing?
I guess we are converging. I’m just pointing out flaws in this option, but I also can’t give a better solution off the top of my head. At least this won’t insta-kill us, assuming that real-world humans count as non-copyable agents (how does that generalize again? Are you sure RL agents can just learn our definition of an agent correctly, and that won’t include stuff like ants?), and that they can’t get excessive virtual resources from our world without our cooperation (in that case a substantial amount of agents goes rogue, and some of them get punished, but some get through). I still think we can do better than this though, somehow.
(Also if with “ban all AI work” you’re referring to the open letter thing, that’s not really what it’s trying to do, but sure)
How does non-harm follow from cooperation? If we remove the “negative reward for killing” part, what stops them from randomly killing agents (and everyone else believing it’s okay, so no punishment), if there is still enough other agents to cooperate with? Grudges? How do they work exactly for harm other than killing?