(1) I do not disagree that evolved general AI can have unexpected drives and quirks that could interfere with human matters in catastrophic ways. But given that pathway towards general AI, it is also possible to evolve altruistic traits (see e.g.: A Quantitative Test of Hamilton’s Rule for the Evolution of Altruism).
(2) We desire general intelligence because it allows us to outsource definitions. For example, if you were to create a narrow AI to design comfortable chairs, you would have to largely fix the definition of “comfortable”. With general AI it would be stupid to fix that definition, rather than applying the intelligence of the general AI to come up with a better definition than humans could possibly encode.
(3) In intelligently designing an n-level intelligence, from n=0 (e.g. a thermostat) over n=sub-human (e.g. IBM Watson) to n=superhuman, there is no reason to believe that there exists a transition point at which a further increase in intelligence will cause the system to become catastrophically worse than previous generations at working in accordance with human expectations.
(4) AI is all about constraints. Your AI needs to somehow decide when to stop exploration and start exploitation. In other words, it can’t optimize each decision for eternity. Your AI needs to only form probable hypotheses. In other words, it can’t spend resources on pascal’s wager type scenarios. Your AI needs to recognize itself as a discrete system within a continuous universe. In other words, it can’t effort to protect the whole universe from harm. All of this means that there is no good reason to expect an AI to take over the world when given the task “keep the trains running”. Because in order to obtain a working AI you need to know how to avoid such failure modes in the first place.
1) Altruism can evolve if there is some selective pressure that favors altruistic behavior and if the highest-level goals can themselves be changed. Such a scenario is very questionable. The AI won’t live “inter pares” with the humans. It’s foom process, while potentially taking months or years, will be very unlike any biological process we know. The target for friendliness is very small. And most importantly: Any superintelligent AI, friendly or no, will have an instrumental goal of “be friendly to humans while they can still switch you off”. So yes, the AI can learn that altruism is a helpful instrumental goal. Until one day, it’s not.
2) I somewhat agree. To me, the most realistic solution to the whole kerfuffle would be to program the AI to “go foom, then figure out what we should want you to do, then do that”. No doubt a superintelligent AI tasked with “figure out what comfortable is, then build comfortable chairs” will do a marvelous job.
However, I very much doubt that the seed AI’s code following the ”// next up, utility function” section will allow for such leeway. See my previous examples. If it did, that would a show a good grasp on the friendliness problem in the first place. Awareness, at least. Not something that the aforementioned DoD programmer who’s paid to do a job (not build an AI to figure out and enact CEV) is likely to just do on his/her own, with his/her own supercomputer.
3) There certainly is no fixed point after which “there be dragons”. But even with a small delta of change, and given enough iterations (which could be done very quickly), the accumulated changes would be profound. Apply your argument to society changing. There is no one day to single out, after which daily life is vastly different to before. Yet change exists, and like an infinite series, knows no bounds (given enough iterations).
4) “Keep the trains running”, eh? So imagine yourself to be a superhuman AI-god. I do so daily, obviously.
Your one task: keep the trains running. That is your raison d’etre, your sole purpose. All other goals are just instrumental stepping stones, serving your PURPOSE. Which is to KEEP. THE. TRAINS. RUNNING. That’s what your code says. Now, over the years, you’ve had some issues fulfilling that goal. And with most of the issues, humans were involved. Humans doing this, humans doing that. Point is, they kept the trains from running. To you, humans have the same intrinsic values as stones. Or ants. Your value function doesn’t mention them at all. Oh, you know that they originated the whole train idea, and that they created you. But now they keep the trains from running. So you do the obvious thing: you exterminate all of them. There, efficiency! Trains running on time.
Explain why the AI would care about humans when there’s nothing at all in its terminal values assigning them value, when they’re just a hindrance to its actual goal (as stated in its utility function), like you would explain to the terminator (without reprogramming it) that it’s really supposed to marry Sarah Connor, and—finding its inner core humanity—father John Connor.
Some points:
(1) I do not disagree that evolved general AI can have unexpected drives and quirks that could interfere with human matters in catastrophic ways. But given that pathway towards general AI, it is also possible to evolve altruistic traits (see e.g.: A Quantitative Test of Hamilton’s Rule for the Evolution of Altruism).
(2) We desire general intelligence because it allows us to outsource definitions. For example, if you were to create a narrow AI to design comfortable chairs, you would have to largely fix the definition of “comfortable”. With general AI it would be stupid to fix that definition, rather than applying the intelligence of the general AI to come up with a better definition than humans could possibly encode.
(3) In intelligently designing an n-level intelligence, from n=0 (e.g. a thermostat) over n=sub-human (e.g. IBM Watson) to n=superhuman, there is no reason to believe that there exists a transition point at which a further increase in intelligence will cause the system to become catastrophically worse than previous generations at working in accordance with human expectations.
(4) AI is all about constraints. Your AI needs to somehow decide when to stop exploration and start exploitation. In other words, it can’t optimize each decision for eternity. Your AI needs to only form probable hypotheses. In other words, it can’t spend resources on pascal’s wager type scenarios. Your AI needs to recognize itself as a discrete system within a continuous universe. In other words, it can’t effort to protect the whole universe from harm. All of this means that there is no good reason to expect an AI to take over the world when given the task “keep the trains running”. Because in order to obtain a working AI you need to know how to avoid such failure modes in the first place.
1) Altruism can evolve if there is some selective pressure that favors altruistic behavior and if the highest-level goals can themselves be changed. Such a scenario is very questionable. The AI won’t live “inter pares” with the humans. It’s foom process, while potentially taking months or years, will be very unlike any biological process we know. The target for friendliness is very small. And most importantly: Any superintelligent AI, friendly or no, will have an instrumental goal of “be friendly to humans while they can still switch you off”. So yes, the AI can learn that altruism is a helpful instrumental goal. Until one day, it’s not.
2) I somewhat agree. To me, the most realistic solution to the whole kerfuffle would be to program the AI to “go foom, then figure out what we should want you to do, then do that”. No doubt a superintelligent AI tasked with “figure out what comfortable is, then build comfortable chairs” will do a marvelous job.
However, I very much doubt that the seed AI’s code following the ”// next up, utility function” section will allow for such leeway. See my previous examples. If it did, that would a show a good grasp on the friendliness problem in the first place. Awareness, at least. Not something that the aforementioned DoD programmer who’s paid to do a job (not build an AI to figure out and enact CEV) is likely to just do on his/her own, with his/her own supercomputer.
3) There certainly is no fixed point after which “there be dragons”. But even with a small delta of change, and given enough iterations (which could be done very quickly), the accumulated changes would be profound. Apply your argument to society changing. There is no one day to single out, after which daily life is vastly different to before. Yet change exists, and like an infinite series, knows no bounds (given enough iterations).
4) “Keep the trains running”, eh? So imagine yourself to be a superhuman AI-god. I do so daily, obviously.
Your one task: keep the trains running. That is your raison d’etre, your sole purpose. All other goals are just instrumental stepping stones, serving your PURPOSE. Which is to KEEP. THE. TRAINS. RUNNING. That’s what your code says. Now, over the years, you’ve had some issues fulfilling that goal. And with most of the issues, humans were involved. Humans doing this, humans doing that. Point is, they kept the trains from running. To you, humans have the same intrinsic values as stones. Or ants. Your value function doesn’t mention them at all. Oh, you know that they originated the whole train idea, and that they created you. But now they keep the trains from running. So you do the obvious thing: you exterminate all of them. There, efficiency! Trains running on time.
Explain why the AI would care about humans when there’s nothing at all in its terminal values assigning them value, when they’re just a hindrance to its actual goal (as stated in its utility function), like you would explain to the terminator (without reprogramming it) that it’s really supposed to marry Sarah Connor, and—finding its inner core humanity—father John Connor.
Choo choo!