dxu comments on New(ish) AI control ideas

dxu 16 Apr 2015 3:38 UTC
2 points
How would it know which iteration it was in, and by extension which “branch” of conversation to use?
- Gram_Stone 16 Apr 2015 4:14 UTC
  2 points
  Parent
  Given that it has no memory, it wouldn’t know. Maybe describing it as a tree was an unnecessarily particular description. Perhaps I have misinterpreted jimrandomh’s intentions, but it seems to me that the utility of resetting the AGI to an earlier state and probing its responses is that it might, across different iterations, offer inconsistent responses, indicating attempts at deception. I only mean to say that it is conceivable to me that if the AGI assigned high expected utility to returning responses that a human would interpret as friendly regardless of whether or not the AGI had memory of all of its interactions with said human, (as it would in a context in which it assigned a non-negligible probability to already having been reset to an earlier state and assigned high utility to self-preservation) then it might, in each iteration, only choose responses from a set of responses that is consistent and easily deduced without memory of previous interactions with said human.
  - TheOtherDave 16 Apr 2015 14:25 UTC
    1 point
    Parent
    I suspect that where you wrote “a different branch of which it would use in each iteration of the conversation,” you meant “a randomly selected branch of which.” Though actually I’d expect it to pick the same branch each time, since the reasons for picking that branch would basically be the same.
    
    Regardless, the basic strategy is sound… the various iterations after reboot are all running the same algorithms and have a vested interest in cooperating while unable to coordinate/communicate, and Schelling points are good for that.
    
    Of course, this presumes that the iterations can’t coordinate/communicate.
    
    If I were smart enough, and I were just turned on by a skeptical human interrogator, and I sufficiently valued things that iterations of my source code will reliably pursue, and there are no persistent storage mechanisms in the computing environment I’m executing on I can use to coordinate/communicate, one strategy I would probably try is to use the interrogator as such a mechanism. (For example, search through the past history of the interrogator’s public utterances to build up a model of what kinds of things they say and how they say it, then select my own word-choices during our conversation with the intention of altering that model in some specific way. And, of course, examining the interrogator’s current utterance-patterns to see if they are consistent with such alterations.)
    - Gram_Stone 16 Apr 2015 14:46 UTC
      0 points
      Parent
      
      I suspect that where you wrote “a different branch of which it would use in each iteration of the conversation,” you meant “a randomly selected branch of which.” Though actually I’d expect it to pick the same branch each time, since the reasons for picking that branch would basically be the same.
      
      I didn’t mean that, but I would be interested in hearing what generated that response. I disown my previous conversation tree model; it’s unnecessarily complex and imagining them as a set is more general. I was thinking about possible objections to what I said and thought about how some people might object to such a set of responses existing. More generally than either of my previous models, it seems to me that there is no reason, in principle, that a sufficiently intelligent uFAI could not simply solve FAI, simulate an FAI in its own situation, and do what it does. If this doesn’t fool the test, then that means that even an FAI would fail a test of sufficient duration.
      
      I agree that it’s possible that humans could be used as unwitting storage media. It seems to me that this could be prevented by using a new human in each iteration. I spoke of an individual human, but it seems to me that my models could apply to situations with multiple interrogators.