Adele Lopez comments on Adele Lopez’s Shortform

Adele Lopez 8 Aug 2020 19:49 UTC
LW: 1 AF: 1
AF
It seems that privacy potentially could “tame” a not-quite-corrigible AI. With a full model, the AGI might receive a request, deduce that activating a certain set of neurons strongly would be the most robust way to make you feel the request was fulfilled, and then design an electrode set-up to accomplish that. Whereas the same AI with a weak model wouldn’t be able to think of anything like that, and might resort to fulfilling the request in a more “normal” way. This doesn’t seem that great, but it does seem to me like this is actually part of what makes humans relatively corrigible.
- Pattern 9 Aug 2020 14:08 UTC
  5 points
  Parent
  Part of it seems like a matter of alignment. It seems like there’s a difference between
  - Someone getting someone else to do something they wouldn’t normally do, especially under false pretenses (or as part of a deal and not keeping up the other side)
  and
  - Someone choosing to go to an oracle AI (or doctor) and saying “How do I beat this addiction that’s ruining my life*?”
  *There’s some scary stories about what people are willing to do to try to solve that problem, including brain surgery.
  - Viliam 15 Aug 2020 17:14 UTC
    3 points
    Parent
    Yeah, I also see “manipulation” in the bad sense of the word as “making me do X without me knowing that I am pushed towards X”. (Or, in more coercive situations, with me knowing, disagreeing with the goal, but being unable to do anything about it.)
    Teaching people, coaching them, curing their addictions, etc., as long as this is explicitly what they wanted (without any hidden extras), it is a “manipulation” in the technical sense of the word, but it is not evil.