jimrandomh comments on Friendly to who?

jimrandomh Apr 16, 2011, 1:11 PM
8 points
There is surprisingly little incentive for selfish AI writers to tilt the friendliness towards themselves. Consider these four outcomes: an AI is created that isn’t friendly to anyone; no AI is created; an AI is created that’s friendly to all humans; or an AI is created that’s friendly only to its creator. A selfish programmer would prefer these in order of increasing preference.

The difference between each of the first three outcomes is huge. Death and extinction versus status quo versus superoptimization. But the difference between an AI friendly to all humans and an AI friendly to just the creator is small; normal human preferences are mostly compatible, and don’t require astronomical resources, so making everyone else happy too would cost very little. But making an AI that’s friendly only to its creator is riskier and less likely to succeed than making one that’s friendly to everyone; the part where it distinguishes them from other humans may have bugs (especially if they try to self-modify later), they can’t recruit help, and other humans may try to stop it. It also creates a time window during which, if the creator dies or suffers brain damage, the AI ends up unfriendly to everyone (including the creator).

So making a selectively-friendly AI just seems like a stupid idea, even before you get to the moral arguments. And the moral arguments point the same way. I’m much less worried about someone making an AI selfishly than I am about someone making an AI stupidly or carelessly, which is a real danger and one that can’t be defused by any philosophical argument.
- TimFreeman Apr 17, 2011, 1:55 AM
  −1 points
  Parent
  There is surprisingly little incentive for selfish AI writers to tilt the friendliness towards themselves.
  
  For normal humans and at the end of the game, I agree with you. However, there are two situations where people may want tilt:
  - Narcissists seem to have an unlimited appetite for adoration from others. That might translate to a desire to get the AI tilted as much as possible in their favor. They are 1% of the population according to the abnormal psych literature, but in my experience I see a much larger fraction of the population being subclinically narcissistic enough to be a problem.
  - If there’s a slow takeoff, the AI will be weak for some period of time. During this time the argument that it controls enough resources to satisfy everyone doesn’t hold. If the organization building it has no other available currency to pay people for help, it might pay in tilt. If the tilt decays toward zero at some rate we could end up with something that is fair. I don’t know how to reconcile that with the scheme described in another comment for dealing with utility monsters by tilting away from them.
  It also creates a time window during which, if the creator dies or suffers brain damage, the AI ends up unfriendly to everyone (including the creator).
  
  I agree that there will be windows like that. To avoid that, we would need a committee taking the lead with some well-defined procedures that allow a member of the committee to be replaced if the others judge him to be insane or deceptive. Given how poorly committee decision making works, I don’t know if that presents more or less risk than simply having one leader and taking the risk of him going insane. The size of the window depends on whether there’s a hard or soft takeoff, and I don’t know which of those to expect.