Alerus comments on Is friendly AI “trivial” if the AI cannot rewire human values?

Alerus 10 May 2012 20:22 UTC
0 points
You’re missing the point of talking about opposition. The AI doesn’t want the outcome of opposition because that has terrible effects on the well-being its trying to maximize, unlike the nazis. This isn’t about winning the war, its about the consequence of war on the measured well-being of people and other people who live in a society where an AI would kill people for what amounted to thought-crime.

And if the machine thinks that’s the best way to make people happy (for whatever horrible reason—perhaps it is convinced by the Repugnant Conclusion and wants to maximize utility by wiping out all the immiserated Russians), we’re still in trouble.

This specifically violates the assumption that the AI has well modeled how any given human measures their well-being.

However, if you’re trying to describe an AI that is set to maximize human value, understands the complexities of the human mind, and won’t make such mistakes, then you are describing friendly AI.

It is the assumption that it models human well-being at least as well as the best a human can model the well-being function of another. However, this constraint by itself does not solve friendly AI, because in a less constrained problem than the one I outlined, the most common response for an AI trying to maximize what humans value is that it will change and rewire what humans value to something more easy to maximize. The entire purpose of this post is to question whether it could achieve this without the ability to manually rewire human values (e.g., could this be done through persuasion?). In other words, you’re claiming friendly AI is solved more easily than the constrained question I posed in the post.
- Swimmy 11 May 2012 2:01 UTC
  0 points
  Parent
  Are you trying to argue that, of all the humans who have done horrible horrible things, not a single one of them 1) modeled other humans at the average or above-average level that humans usually model each other, and 2) not a single one of them thought they were trying to make the world better off? Or are you trying to argue that not a single one of them ever caused an existential threat?
  
  My guess is that Lenin, for instance, had an above-average human-modeling mind and thought he was taking the first steps of bringing the whole world into a new prosperous era free from class war and imperialism. And he was wrong and thousands of people died. The kulaks opposed, in the form of destroying their farms. Lenin probably didn’t “want the outcome of opposition,” but that didn’t stop him from thinking mass slaughter was the solution.
  
  The ability to model the well-being of humans and the “friendliness” of the AI are the same thing, provided the AI is programmed to maximize that well-being value. If your AI can’t ever make mistakes like that, it’s a friendly AI. If it can, it’s trouble whether or not it can alter human values.