JGWeissman comments on Friendly, but Dumb: Why formal Friendliness proofs may not be as safe as they appear

JGWeissman 20 Apr 2010 3:09 UTC
7 points

On the other hand, there’s no particular reason Betty should continue to self-improve that I can see.

A sub goal that is useful in achieving many primary goals is to improve one’s general goal achieving ability.

An attempt to cripple a FAI by limiting its general intelligence would be noticed, because the humans would expect it to FOOM, and if it actually does FOOM it will be smart enough.

A sneakier unfriendly AI might try to design an FAI with a stupid prior, with blind spots the uFAI can exploit. So you would want your Friendliness test to look at not just the goal system, but every module of the supposed FAI, including epistemology and decision theory.

But, even a thorough test does not make it a good idea to run a supposed FAI designed by an uFAI. This allows the uFAI to optimize for its purpose every bit of uncertainty we have about the supposed FAI.