I don’t think all AI catastrophes come from oversimplification of value functions. Suppose we had 1000 weak preferances, u0,⋯u999 , with Utotal=∑999i=0ui . Each of which is supposed to be ui∈[0,1] but due to some weird glitch in the definition of u42 , it has an unforeseen maximum of 1000,000, and that maximum is paperclips. In this scenario, the AI is only as friendly as the least friendly piece.
Alternatively, if the value of each ui is linear or convex in resources spent maximizing it, or other technical conditions hold, then the AI just picks a single ui to focus all resources on. If some term is very easily satisfied, say u3is a slight preference that it not wipe out all beetles, then we get a few beetles living in a little beetle box, and 99.99...% of resources turned into whatever kind of paperclip it would otherwise have made.
If we got everyone in the world who is “tech literate” to program a utility function ( in some easy to use utility function programming tool?), bounded them all and summed the lot together, then I suspect that the AI would still do nothing like optimizing human values. (To me, this looks like a disaster waiting to happen)
I agree. (on that issue, I think a soft min is better than a sum). However throwing away ui’s is still a bad idea; my requirement is neccessary but not sufficient.
I don’t think all AI catastrophes come from oversimplification of value functions. Suppose we had 1000 weak preferances, u0,⋯u999 , with Utotal=∑999i=0ui . Each of which is supposed to be ui∈[0,1] but due to some weird glitch in the definition of u42 , it has an unforeseen maximum of 1000,000, and that maximum is paperclips. In this scenario, the AI is only as friendly as the least friendly piece.
Alternatively, if the value of each ui is linear or convex in resources spent maximizing it, or other technical conditions hold, then the AI just picks a single ui to focus all resources on. If some term is very easily satisfied, say u3is a slight preference that it not wipe out all beetles, then we get a few beetles living in a little beetle box, and 99.99...% of resources turned into whatever kind of paperclip it would otherwise have made.
If we got everyone in the world who is “tech literate” to program a utility function ( in some easy to use utility function programming tool?), bounded them all and summed the lot together, then I suspect that the AI would still do nothing like optimizing human values. (To me, this looks like a disaster waiting to happen)
I agree. (on that issue, I think a soft min is better than a sum). However throwing away ui’s is still a bad idea; my requirement is neccessary but not sufficient.