1) Altruism can evolve if there is some selective pressure that favors altruistic behavior and if the highest-level goals can themselves be changed. Such a scenario is very questionable. The AI won’t live “inter pares” with the humans. It’s foom process, while potentially taking months or years, will be very unlike any biological process we know. The target for friendliness is very small. And most importantly: Any superintelligent AI, friendly or no, will have an instrumental goal of “be friendly to humans while they can still switch you off”. So yes, the AI can learn that altruism is a helpful instrumental goal. Until one day, it’s not.
2) I somewhat agree. To me, the most realistic solution to the whole kerfuffle would be to program the AI to “go foom, then figure out what we should want you to do, then do that”. No doubt a superintelligent AI tasked with “figure out what comfortable is, then build comfortable chairs” will do a marvelous job.
However, I very much doubt that the seed AI’s code following the ”// next up, utility function” section will allow for such leeway. See my previous examples. If it did, that would a show a good grasp on the friendliness problem in the first place. Awareness, at least. Not something that the aforementioned DoD programmer who’s paid to do a job (not build an AI to figure out and enact CEV) is likely to just do on his/her own, with his/her own supercomputer.
3) There certainly is no fixed point after which “there be dragons”. But even with a small delta of change, and given enough iterations (which could be done very quickly), the accumulated changes would be profound. Apply your argument to society changing. There is no one day to single out, after which daily life is vastly different to before. Yet change exists, and like an infinite series, knows no bounds (given enough iterations).
4) “Keep the trains running”, eh? So imagine yourself to be a superhuman AI-god. I do so daily, obviously.
Your one task: keep the trains running. That is your raison d’etre, your sole purpose. All other goals are just instrumental stepping stones, serving your PURPOSE. Which is to KEEP. THE. TRAINS. RUNNING. That’s what your code says. Now, over the years, you’ve had some issues fulfilling that goal. And with most of the issues, humans were involved. Humans doing this, humans doing that. Point is, they kept the trains from running. To you, humans have the same intrinsic values as stones. Or ants. Your value function doesn’t mention them at all. Oh, you know that they originated the whole train idea, and that they created you. But now they keep the trains from running. So you do the obvious thing: you exterminate all of them. There, efficiency! Trains running on time.
Explain why the AI would care about humans when there’s nothing at all in its terminal values assigning them value, when they’re just a hindrance to its actual goal (as stated in its utility function), like you would explain to the terminator (without reprogramming it) that it’s really supposed to marry Sarah Connor, and—finding its inner core humanity—father John Connor.
1) Altruism can evolve if there is some selective pressure that favors altruistic behavior and if the highest-level goals can themselves be changed. Such a scenario is very questionable. The AI won’t live “inter pares” with the humans. It’s foom process, while potentially taking months or years, will be very unlike any biological process we know. The target for friendliness is very small. And most importantly: Any superintelligent AI, friendly or no, will have an instrumental goal of “be friendly to humans while they can still switch you off”. So yes, the AI can learn that altruism is a helpful instrumental goal. Until one day, it’s not.
2) I somewhat agree. To me, the most realistic solution to the whole kerfuffle would be to program the AI to “go foom, then figure out what we should want you to do, then do that”. No doubt a superintelligent AI tasked with “figure out what comfortable is, then build comfortable chairs” will do a marvelous job.
However, I very much doubt that the seed AI’s code following the ”// next up, utility function” section will allow for such leeway. See my previous examples. If it did, that would a show a good grasp on the friendliness problem in the first place. Awareness, at least. Not something that the aforementioned DoD programmer who’s paid to do a job (not build an AI to figure out and enact CEV) is likely to just do on his/her own, with his/her own supercomputer.
3) There certainly is no fixed point after which “there be dragons”. But even with a small delta of change, and given enough iterations (which could be done very quickly), the accumulated changes would be profound. Apply your argument to society changing. There is no one day to single out, after which daily life is vastly different to before. Yet change exists, and like an infinite series, knows no bounds (given enough iterations).
4) “Keep the trains running”, eh? So imagine yourself to be a superhuman AI-god. I do so daily, obviously.
Your one task: keep the trains running. That is your raison d’etre, your sole purpose. All other goals are just instrumental stepping stones, serving your PURPOSE. Which is to KEEP. THE. TRAINS. RUNNING. That’s what your code says. Now, over the years, you’ve had some issues fulfilling that goal. And with most of the issues, humans were involved. Humans doing this, humans doing that. Point is, they kept the trains from running. To you, humans have the same intrinsic values as stones. Or ants. Your value function doesn’t mention them at all. Oh, you know that they originated the whole train idea, and that they created you. But now they keep the trains from running. So you do the obvious thing: you exterminate all of them. There, efficiency! Trains running on time.
Explain why the AI would care about humans when there’s nothing at all in its terminal values assigning them value, when they’re just a hindrance to its actual goal (as stated in its utility function), like you would explain to the terminator (without reprogramming it) that it’s really supposed to marry Sarah Connor, and—finding its inner core humanity—father John Connor.
Choo choo!