endoself comments on The Friendly AI Game

endoself 16 Mar 2011 18:41 UTC
0 points
An AI noticing any patterns in its own behaviour is not a rare case that indicates that something has already gone wrong, but, if we allow this, it will accidentally discover its own safeguards fairly quickly: they are anything that causes its behaviour to not maximize what it believes to be its utility function.
- DavidAgain 16 Mar 2011 22:52 UTC
  0 points
  Parent
  It can’t discover it’s safeguards, as it’s eliminated if it breaks ones. These are serious, final safeguards!
  
  You could argue that a surviving one would notice that it hadn;t happened to do various things, and would form a sort of anthropic principle that the chance of it not having to have killed a human or whatever the safeguards are are very low, to note that humans have got this safeguard system and to work out from there what they are. But I think it would be easier to work the safeguards out more directly.
  - endoself 16 Mar 2011 23:30 UTC
    0 points
    Parent
    I had misremembered something; I thought that there was a safeguard to ensure that it never tries to learn about its safeguards, rather than a prior making this unlikely.
    
    Perfect safeguards are possible; in an extreme case, we could have a FAI monitoring every aspect of our first AI’s behaviour. Can you give me a specific example of a safeguard so I can find a hole in it? :)