DavidAgain comments on The Friendly AI Game

DavidAgain 16 Mar 2011 17:23 UTC
0 points
Not quite. I’m assuming you also try to make it so it wouldn’t act like that in the first place, so if it WANTS to do that, you’ve gone wrong. That’s the underlying issue: to identify dangerous tendencies and stop them growing at all, rather than trying to redirect them.
- endoself 16 Mar 2011 18:41 UTC
  0 points
  Parent
  An AI noticing any patterns in its own behaviour is not a rare case that indicates that something has already gone wrong, but, if we allow this, it will accidentally discover its own safeguards fairly quickly: they are anything that causes its behaviour to not maximize what it believes to be its utility function.
  - DavidAgain 16 Mar 2011 22:52 UTC
    0 points
    Parent
    It can’t discover it’s safeguards, as it’s eliminated if it breaks ones. These are serious, final safeguards!
    
    You could argue that a surviving one would notice that it hadn;t happened to do various things, and would form a sort of anthropic principle that the chance of it not having to have killed a human or whatever the safeguards are are very low, to note that humans have got this safeguard system and to work out from there what they are. But I think it would be easier to work the safeguards out more directly.
    - endoself 16 Mar 2011 23:30 UTC
      0 points
      Parent
      I had misremembered something; I thought that there was a safeguard to ensure that it never tries to learn about its safeguards, rather than a prior making this unlikely.
      
      Perfect safeguards are possible; in an extreme case, we could have a FAI monitoring every aspect of our first AI’s behaviour. Can you give me a specific example of a safeguard so I can find a hole in it? :)