In a sense, the Friendly AI problem is about delegating the definition of Friendliness to a superintelligence. The main issue is that it’s easy to underestimate (on account of the Mind Projection Fallacy) how large a kernel of the correct answer it needs to start off with, in order for that delegation to work properly. There’s rather a lot that goes into this, and unfortunately it’s scattered over many posts that aren’t collected in one sequence, but you can find much of it linked from Fake Fake Utility Functions (sic, and not a typo) and Value is Fragile.
Welcome to Less Wrong!
In a sense, the Friendly AI problem is about delegating the definition of Friendliness to a superintelligence. The main issue is that it’s easy to underestimate (on account of the Mind Projection Fallacy) how large a kernel of the correct answer it needs to start off with, in order for that delegation to work properly. There’s rather a lot that goes into this, and unfortunately it’s scattered over many posts that aren’t collected in one sequence, but you can find much of it linked from Fake Fake Utility Functions (sic, and not a typo) and Value is Fragile.