Some thoughts in response to the above two comments.
First, don’t forget that I was trying to debunk a very particular idea, rather than other cases. My target was the idea that a future superintelligent AGI could be programmed to have the very best of intentions, and it might claim to be exercising the most extreme diligence in pursuit of human happiness, while at the same time it might think up a scheme that causes most of humanity to scream with horror while it forces the scheme on those humans. That general idea has been promoted countless times (and has been used to persuade people like Elon Musk and Stephen Hawking to declare that AI could be cause the doom of the human race), and it is has also been cited as an almost inevitable end point of the process of AGI development, rather than just a very-low-risk possibility with massive consequences.
So, with that in mind, I can say that there are many points of agreement between us on the subject of all those cases that you brought up, above, where there are ethical dilemmas of a lesser sort. There is a lot of scope for us having a detailed discussion about all of those dilemmas—and I would love to get into the meat of that discussion, at some point—but that wasn’t really what I was trying to tackle in the paper itself.
(One thing I could say about all those cases is that if the AGI were to “only” have the same dilemmas that we have, when trying to figure out the various ethical condundrums of that sort, then we are no worse off than we are now. Some people use the inability of the AGI to come up with optimal solutions in (e.g.) Trolley problems as a way to conclude that said AGIs would be unethical and dangerous. I strongly disagree with those who take that stance).
Here is a more important comment on that, though. Everything really comes down to whether the AGI is going to be subject to bizarre/unexpected failures. In other words, it goes along perfectly well for a long time, apparently staying consistent with what we’d expect of an ethically robust robot, and then one day it suddenly does something totally drastic that turns out to have been caused by a peculiar “edge case” that we never considered when we programmed it. (I am reminded of IBM’s Watson answering so many questions correctly on Jeopardy, and then suddenly answering the question “What do grasshoppers eat?” with the utterly stupid reply “Kosher.”).
That issue has been at the core of the discussion I have been having with Jessicat, above. I won’t try to repeat all of what I said there, but my basic position is that, yes, that is the core question, and what i have tried to do is to explain that there is a feasible way to address precisely that issue. That was what all the “Swarm Relaxation” stuff was about.
Finally, to reply to your statement that you think the “logical consistency” idea does not go through, can I ask that you look at my reply to “misterbailey”, elsewhere in the comments? He asked a question, so I tried to clarify exactly where the logical inconsistency was located. Apparently he had misunderstood what the inconsistency was supposed to be. It might be that the way I phrased it there could shed some light on your disagreement with it. Let me know if it does.
it is has also been cited as an almost inevitable end point of the process of AGI development, rather than just a very-low-risk possibility with massive consequences.
I suspect this may be because of different traditions. I have a lot of experience in numerical optimization, and one of my favorite optimization stories is Dantzig’s attempt to design an optimal weight-loss diet, recounted here. The gap between a mathematical formulation of a problem, and the actual problem in reality, is one that I come across regularly, and I’ve spent many hours building bridges over those gaps.
As a result, I find it easy to imagine that I’ve expressed a complicated problem in a way that I hope is complete, but the optimization procedure returns a solution that is insane for reality but perfect for the problem as I expressed it. As the role of computers moves from coming up with plans that humans have time to verify (like delivering a recipe to Anne, who can laugh off the request for 500 gallons of vinegar) to executing actions that humans do not have time to verify (like various emergency features of cars, especially the self-driving variety, or high-frequency trading), this possibility becomes more and more worrying. (Even when humans do verify the system, the more trustworthy the system, the more the human operator will trust it—and thus, the more likely that the human will fail to catch a system error.)
Similarly, one might think of another technological field whose history is more mature, yet still recent: aircraft design. The statement “that airplanes will crash is almost inevitable” seems more wise than not to me—out of all of the possible designs that you or I would pattern-match to an “airplane,” almost all of them crash. Unfortunately, designs that their designer is sure will work still crash sometimes. Of course, some airplanes work, and we’ve found those designs, and a hundred years into commercial air travel the statement that crashes are almost inevitable seems perhaps silly.
So just like we might acknowledge that it’s difficult to get a plane that flies without crashing, and it’s also difficult to be sure that a design will fly without crashing without testing it, it seems reasonable to claim that it will also be difficult for AGI design to operate without unrecoverable mistakes—but even more so.
Everything really comes down to whether the AGI is going to be subject to bizarre/unexpected failures.
I agree with this, and further, I agree that concepts encoded by many weak constraints will be more robust than concepts encoded by few hard constraints, by intuitions gained from ensemble learners.
I might elaborate the issue further, by pointing out that there is both the engineering issue, of whether or not it fails gracefully in all edge cases, and the communication issue, of whether or not we are convinced that it will fail gracefully. Both false positives and false negatives are horrible.
Finally, to reply to your statement that you think the “logical consistency” idea does not go through, can I ask that you look at my reply to “misterbailey”, elsewhere in the comments? He asked a question, so I tried to clarify exactly where the logical inconsistency was located. Apparently he had misunderstood what the inconsistency was supposed to be. It might be that the way I phrased it there could shed some light on your disagreement with it. Let me know if it does.
I don’t think I agree with point 1 that you raise here:
1) Conclusions produced by my reasoning engine are always correct. [This is the Doctrine of Logical Infallibility]
I think that any active system has to implicitly follow a doctrine that I’ll state as “I did the best I could have, knowing what I did then.” That is, I restate your (2) as the system knowing that, living in an uncertain universe and not having logical omniscience, it will eventually make mistakes. Perhaps in response it will shut itself down, and become an inactive system (this is the inconsistency that I think you’re pointing at). Or perhaps it will run the numbers and say “it’s better to try something than do nothing, even after taking the risk of mistakes into account.”
Now, of course, this isn’t an imperative to always act immediately without consideration. Oftentimes, the thing to try is “wait and think of a better plan” or “ask others if this is a good idea or not,” but the problem of discernment shows up again. What logic is the system using to determine when to stop thinking about new plans? What logic is it using to determine whether or not to ask for advice? If it could predict what mistakes it would make ahead of time, it wouldn’t make them!
To go back to my analogy of aircraft design: suppose we talked about the “quality” of aircraft, which was some overall gestalt of how well they flew, and we eagerly looked forward to days when aircraft designs become better.
At one point, the worry is raised that future aircraft designs might be too high quality. On the face of it, this sounds ridiculous: how could it be that increasing the quality of the aircraft makes it less safe or desirable? Further elaboration reveals that there are two parts of aircraft design: engines and guidance systems. If engines grow much more powerful, but guidance systems remain the same, the aircraft might become much less safe—a tremble at the controls, and now the aircraft is spinning madly.
Relatedly, I found your “Because intelligence” section unsatisfying, because it seems like it’s resisting that sort of separation—separating ‘intelligence’ into, say, ‘cleverness’ (the ability to find a plan that achieves some consequence) and ‘wisdom’ (the ability to determine the consequences of a plan, and the desirability of those consequences) seems helpful when talking about designing intelligent agents.
I think Eliezer and others point out that systems that are very clever but not very wise are very dangerous and that cleverness and wisdom are potentially generated by different components. It seems to me that your models of intelligence have a deeper connection between cleverness and wisdom, and so you think it’s considerably less likely that we’ll get that sort of dangerous system that is clever but not wise.
Some thoughts in response to the above two comments.
First, don’t forget that I was trying to debunk a very particular idea, rather than other cases. My target was the idea that a future superintelligent AGI could be programmed to have the very best of intentions, and it might claim to be exercising the most extreme diligence in pursuit of human happiness, while at the same time it might think up a scheme that causes most of humanity to scream with horror while it forces the scheme on those humans. That general idea has been promoted countless times (and has been used to persuade people like Elon Musk and Stephen Hawking to declare that AI could be cause the doom of the human race), and it is has also been cited as an almost inevitable end point of the process of AGI development, rather than just a very-low-risk possibility with massive consequences.
So, with that in mind, I can say that there are many points of agreement between us on the subject of all those cases that you brought up, above, where there are ethical dilemmas of a lesser sort. There is a lot of scope for us having a detailed discussion about all of those dilemmas—and I would love to get into the meat of that discussion, at some point—but that wasn’t really what I was trying to tackle in the paper itself.
(One thing I could say about all those cases is that if the AGI were to “only” have the same dilemmas that we have, when trying to figure out the various ethical condundrums of that sort, then we are no worse off than we are now. Some people use the inability of the AGI to come up with optimal solutions in (e.g.) Trolley problems as a way to conclude that said AGIs would be unethical and dangerous. I strongly disagree with those who take that stance).
Here is a more important comment on that, though. Everything really comes down to whether the AGI is going to be subject to bizarre/unexpected failures. In other words, it goes along perfectly well for a long time, apparently staying consistent with what we’d expect of an ethically robust robot, and then one day it suddenly does something totally drastic that turns out to have been caused by a peculiar “edge case” that we never considered when we programmed it. (I am reminded of IBM’s Watson answering so many questions correctly on Jeopardy, and then suddenly answering the question “What do grasshoppers eat?” with the utterly stupid reply “Kosher.”).
That issue has been at the core of the discussion I have been having with Jessicat, above. I won’t try to repeat all of what I said there, but my basic position is that, yes, that is the core question, and what i have tried to do is to explain that there is a feasible way to address precisely that issue. That was what all the “Swarm Relaxation” stuff was about.
Finally, to reply to your statement that you think the “logical consistency” idea does not go through, can I ask that you look at my reply to “misterbailey”, elsewhere in the comments? He asked a question, so I tried to clarify exactly where the logical inconsistency was located. Apparently he had misunderstood what the inconsistency was supposed to be. It might be that the way I phrased it there could shed some light on your disagreement with it. Let me know if it does.
I suspect this may be because of different traditions. I have a lot of experience in numerical optimization, and one of my favorite optimization stories is Dantzig’s attempt to design an optimal weight-loss diet, recounted here. The gap between a mathematical formulation of a problem, and the actual problem in reality, is one that I come across regularly, and I’ve spent many hours building bridges over those gaps.
As a result, I find it easy to imagine that I’ve expressed a complicated problem in a way that I hope is complete, but the optimization procedure returns a solution that is insane for reality but perfect for the problem as I expressed it. As the role of computers moves from coming up with plans that humans have time to verify (like delivering a recipe to Anne, who can laugh off the request for 500 gallons of vinegar) to executing actions that humans do not have time to verify (like various emergency features of cars, especially the self-driving variety, or high-frequency trading), this possibility becomes more and more worrying. (Even when humans do verify the system, the more trustworthy the system, the more the human operator will trust it—and thus, the more likely that the human will fail to catch a system error.)
Similarly, one might think of another technological field whose history is more mature, yet still recent: aircraft design. The statement “that airplanes will crash is almost inevitable” seems more wise than not to me—out of all of the possible designs that you or I would pattern-match to an “airplane,” almost all of them crash. Unfortunately, designs that their designer is sure will work still crash sometimes. Of course, some airplanes work, and we’ve found those designs, and a hundred years into commercial air travel the statement that crashes are almost inevitable seems perhaps silly.
So just like we might acknowledge that it’s difficult to get a plane that flies without crashing, and it’s also difficult to be sure that a design will fly without crashing without testing it, it seems reasonable to claim that it will also be difficult for AGI design to operate without unrecoverable mistakes—but even more so.
I agree with this, and further, I agree that concepts encoded by many weak constraints will be more robust than concepts encoded by few hard constraints, by intuitions gained from ensemble learners.
I might elaborate the issue further, by pointing out that there is both the engineering issue, of whether or not it fails gracefully in all edge cases, and the communication issue, of whether or not we are convinced that it will fail gracefully. Both false positives and false negatives are horrible.
I don’t think I agree with point 1 that you raise here:
I think that any active system has to implicitly follow a doctrine that I’ll state as “I did the best I could have, knowing what I did then.” That is, I restate your (2) as the system knowing that, living in an uncertain universe and not having logical omniscience, it will eventually make mistakes. Perhaps in response it will shut itself down, and become an inactive system (this is the inconsistency that I think you’re pointing at). Or perhaps it will run the numbers and say “it’s better to try something than do nothing, even after taking the risk of mistakes into account.”
Now, of course, this isn’t an imperative to always act immediately without consideration. Oftentimes, the thing to try is “wait and think of a better plan” or “ask others if this is a good idea or not,” but the problem of discernment shows up again. What logic is the system using to determine when to stop thinking about new plans? What logic is it using to determine whether or not to ask for advice? If it could predict what mistakes it would make ahead of time, it wouldn’t make them!
To go back to my analogy of aircraft design: suppose we talked about the “quality” of aircraft, which was some overall gestalt of how well they flew, and we eagerly looked forward to days when aircraft designs become better.
At one point, the worry is raised that future aircraft designs might be too high quality. On the face of it, this sounds ridiculous: how could it be that increasing the quality of the aircraft makes it less safe or desirable? Further elaboration reveals that there are two parts of aircraft design: engines and guidance systems. If engines grow much more powerful, but guidance systems remain the same, the aircraft might become much less safe—a tremble at the controls, and now the aircraft is spinning madly.
Relatedly, I found your “Because intelligence” section unsatisfying, because it seems like it’s resisting that sort of separation—separating ‘intelligence’ into, say, ‘cleverness’ (the ability to find a plan that achieves some consequence) and ‘wisdom’ (the ability to determine the consequences of a plan, and the desirability of those consequences) seems helpful when talking about designing intelligent agents.
I think Eliezer and others point out that systems that are very clever but not very wise are very dangerous and that cleverness and wisdom are potentially generated by different components. It seems to me that your models of intelligence have a deeper connection between cleverness and wisdom, and so you think it’s considerably less likely that we’ll get that sort of dangerous system that is clever but not wise.