Not necessarily friendly in the sense of being friendly to everyone as we all have differing utility functions, sometimes radically differing.
But I dispute the position that “if an AI doesn’t care about humans in the way we want them to, it almost certainly takes us apart and uses the resources to create whatever it does care about”.
Consider:
A totally unfriendly AI whose main goal is explicitly the extinction of humanity then turning itself off.
For us that’s an unfriendly AI.
One, however that doesn’t kill any of us but basically leaves us alone is defined by those of you who define “friendly AI” to be “kind to us”/”doing what we all want”/”maximizing our utility functions” etc is not unfriendly because by definition it doesn’t kill all of us.
Unless unfriendly also includes “won’t kill all of us but ignores us” et cetera.
Am I for example unfriendly to you if I spent my next month’s paycheck on paperclips but did you no harm?
Well, no. If it ignores us I probably wouldn’t call it “unfriendly”—but I don’t really mind if someone else does. It’s certainly not FAI. But an AI does need to have some utility function, otherwise it does nothing (and isn’t, in truth, intelligent at all), and will only ignore humanity if it’s explicitly programmed to. This ought to be as difficult an engineering problem as FAI—hence why I said it “almost certainly takes us apart”. You can’t get there by failing at FAI, except by being extremely lucky, and why would you want to go there on purpose?
Not necessarily friendly in the sense of being friendly to everyone as we all have differing utility functions, sometimes radically differing.
Yes, it would be a really bad idea to have a superintelligence optimise the world for just one person’s utility function.
“But an AI does need to have some utility function”
What if the “optimization of the utility function” is bounded like my own personal predilection with spending my paycheck on paperclips one time only and then stopping?
Is it sentient if it sits in a corner and thinks to itself, running simulations but won’t talk to you unless you offer it a trade e.g. of some paperclips?
Is it possible that we’re conflating “friendly” with “useful but NOT unfriendly” and we’re struggling with defining what “useful” means?
If it likes sitting in a corner and thinking to itself, and doesn’t care about anything else, it is very likely to turn everything around it (including us) into computronium so that it can think to itself better.
If you put a threshold on it to prevent it from doing stuff like that, that’s a little better, but not much. If it has a utility function that says “Think to yourself about stuff, but do not mess up the lives of humans in doing so”, then what you have now is an AI that is motivated to find loopholes in (the implementation of) that second clause, because anything that can get an increased fulfilment of the first clause will give it a higher utility score overall.
You can get more and more precise than that and cover more known failure modes with their own individual rules, but if it’s very intelligent or powerful it’s tough to predict what terrible nasty stuff might still be in the intersection of all the limiting conditions we create. Hidden complexity of wishes and all that jazz.
More friendly to you. Yes.
Not necessarily friendly in the sense of being friendly to everyone as we all have differing utility functions, sometimes radically differing.
But I dispute the position that “if an AI doesn’t care about humans in the way we want them to, it almost certainly takes us apart and uses the resources to create whatever it does care about”.
Consider: A totally unfriendly AI whose main goal is explicitly the extinction of humanity then turning itself off. For us that’s an unfriendly AI.
One, however that doesn’t kill any of us but basically leaves us alone is defined by those of you who define “friendly AI” to be “kind to us”/”doing what we all want”/”maximizing our utility functions” etc is not unfriendly because by definition it doesn’t kill all of us.
Unless unfriendly also includes “won’t kill all of us but ignores us” et cetera.
Am I for example unfriendly to you if I spent my next month’s paycheck on paperclips but did you no harm?
Well, no. If it ignores us I probably wouldn’t call it “unfriendly”—but I don’t really mind if someone else does. It’s certainly not FAI. But an AI does need to have some utility function, otherwise it does nothing (and isn’t, in truth, intelligent at all), and will only ignore humanity if it’s explicitly programmed to. This ought to be as difficult an engineering problem as FAI—hence why I said it “almost certainly takes us apart”. You can’t get there by failing at FAI, except by being extremely lucky, and why would you want to go there on purpose?
Yes, it would be a really bad idea to have a superintelligence optimise the world for just one person’s utility function.
“But an AI does need to have some utility function”
What if the “optimization of the utility function” is bounded like my own personal predilection with spending my paycheck on paperclips one time only and then stopping?
Is it sentient if it sits in a corner and thinks to itself, running simulations but won’t talk to you unless you offer it a trade e.g. of some paperclips?
Is it possible that we’re conflating “friendly” with “useful but NOT unfriendly” and we’re struggling with defining what “useful” means?
If it likes sitting in a corner and thinking to itself, and doesn’t care about anything else, it is very likely to turn everything around it (including us) into computronium so that it can think to itself better.
If you put a threshold on it to prevent it from doing stuff like that, that’s a little better, but not much. If it has a utility function that says “Think to yourself about stuff, but do not mess up the lives of humans in doing so”, then what you have now is an AI that is motivated to find loopholes in (the implementation of) that second clause, because anything that can get an increased fulfilment of the first clause will give it a higher utility score overall.
You can get more and more precise than that and cover more known failure modes with their own individual rules, but if it’s very intelligent or powerful it’s tough to predict what terrible nasty stuff might still be in the intersection of all the limiting conditions we create. Hidden complexity of wishes and all that jazz.