endoself comments on The Friendly AI Game

endoself 16 Mar 2011 23:30 UTC
0 points
I had misremembered something; I thought that there was a safeguard to ensure that it never tries to learn about its safeguards, rather than a prior making this unlikely.

Perfect safeguards are possible; in an extreme case, we could have a FAI monitoring every aspect of our first AI’s behaviour. Can you give me a specific example of a safeguard so I can find a hole in it? :)