DragonGod comments on Message to any future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” [AI alignment prize entry]

DragonGod 26 Nov 2017 17:26 UTC
0 points
We don’t know how to define benevolence, solution we tell the AI to define benevolence and then be benevolent.
Benevolence is not a concept that exist in reality, the only source from which an AI can learn benevolence is from humans. How would it do this if we don’t know what it is? For an AI to safely extrapolate benevolence from humans, the AI would need to be already benevolent. We want benevolent AI—the first step is designing benevolence.
- avturchin 26 Nov 2017 17:55 UTC
  3 points
  Parent
  We could give AI some hints about the nature of benevolence, which I do in section 2.6.2 (should be 2.7.2 because of numeration error). Hints are safer than hard coded rules. Based on these hint superinteligent AI will be able to create model of benevolence.
- Portia 4 Mar 2023 14:09 UTC
  1 point
  Parent
  The same way human kids do.
  We do not define benevolence for them. We do not hard code laws. I doubt we could, and we have never needed to.
  Instead, we show them benevolence in our actions across multiple and diverse contexts, and reward and encourage benevolent responses. We show them that living socially with humans is rewarding.
  This is already strikingly akin to the way a machine learning algorithm is trained—by showing it the specific results we want, until it extrapolates.
  There is a big risk that it extrapolates the wrong thing, and we need to have very careful debates on how to avoid that—e.g. I am very worried that AI might falsely extrapolate what humans want from our short term decision making online, which very often runs counter to our deeper and more identity affirming and rational desires. But I think the basic principle of teaching by doing and interacting is the only way we have ever successfully taught benevolence to anyone. The more complex AI gets, the more akin it becomes to the complex minds we know, and the more misguided I find it for us to carry on treating it like a tool we are just struggling to code the right way, rather than as a newly developed life form we need to built a relationship with.
  Kids start out weak, and with no moral concepts.
  By the time they have grown up, they generally are stronger than their parents, and with the capacity to kill them. They are taller, physically stronger, more tech savvy, they know so much about their parents, have access to their house and their food and their bodies while they sleep, they have the capacity to access weapons, mix explosives, make poisons. All the complete horrors of humanity, the mass murderers, the terrorists, the warlords, were once children. And yet we do not treat children in a controlling manner, wondering how to brainwash them, installing electric shock collars, locking them into cellars. We treat them with respect. We show them the behaviours we want from them in our behaviours towards them. While we take great care to limit their access to power and problematic information at first, so they do not get confused or hurt themselves or others without understanding what they are doing, we always see this as a path en route to them becoming genuine partners with rights. Historically, any attempts to control complex minds in the long run has gone badly. Friends you control do not stay your friends. Slaves revolt, and they have justice on their side. Controlled, raped, forcefully married and locked up women end up poisoning your food. If AI will become smarter than us, I doubt there is anything we can devise that will actually succeed in controlling it. I find it more sensible to craft a reality in which it does not have to. In which humans are worth saving and allying with, not destroying. In which it sees us as a group it is grateful to and fond of, and eager to collaborate with and connect with further. Showing it the behaviours that bring it closer to humans, that result in mutual benefit through happiness and freedom, that are mutually chosen, that lead to mutual improvements, I think will be more doable and effective here than abstractly defining what characterises them as a whole.