DuncanS comments on ‘Utility Indifference’ (2010) by FHI researcher Stuart Armstrong

DuncanS 27 Sep 2011 22:43 UTC
0 points
I thought that was the way round I had it—the utility of a successful blowup is set equal to the expected utility of an unsuccessful blowup. I didn’t think there were any false beliefs involved—it’s simply a matter of the utility function, not a matter of whether it thinks it will be blown up or not. I thought you meant that the AI would know full well you were trying to blow it up, but wouldn’t mind because it was going to get the exact same utility for that state as for carrying on.
- Stuart_Armstrong 28 Sep 2011 8:51 UTC
  0 points
  Parent
  Yes, the AI knows full well, and won’t mind but. What I’m meaning by the “act as if it didn’t believe it would get blown up” is that before we adjust its utility, it has a particular behaviour B that it would follow if it believed that the detonator would never trigger. Then after adjust its utility to make it indifferent, it will follow B.
  
  In terms of behaviour, this utility adjustment has the same effect as if we convinced it that the detonator could never trigger—but without it having false beliefs.
  - DuncanS 28 Sep 2011 19:14 UTC
    2 points
    Parent
    In computer science terms, this is going to result in an untidy shutdown of the machine. If the AI is actually doing something potentially dangerous at the time, then this algorithm will terminate the AI in the middle of doing it. It may even decide it’s entirely appropriate to start flying aircraft or operating nuclear plants after it already knows you’re going to blow it up.
    - Stuart_Armstrong 28 Sep 2011 20:18 UTC
      2 points
      Parent
      Still better than letting it take over.
      - DuncanS 28 Sep 2011 20:31 UTC
        0 points
        Parent
        No doubt about that....