And so I wouldn’t say that a well-designed Friendly AI must necessarily refuse to push that one person off the ledge to stop the train. Obviously, I would expect any decent superintelligence to come up with a superior third alternative. But if those are the only two alternatives, and the FAI judges that it is wiser to push the one person off the ledge—even after taking into account knock-on effects on any humans who see it happen and spread the story, etc.—then I don’t call it an alarm light, if an AI says that the right thing to do is sacrifice one to save five. Again, I don’t go around pushing people into the paths of trains myself, nor stealing from banks to fund my altruistic projects. …
This bit sounds a little alarming considering how much more seriously Eliezer has taken other kinds of AI problems before, for an example in this post.
I appreciate the straightforward logic of simply choosing the distinctly better option between two outcomes, but what this is lacking is the very automatic way for people to perceive things as agents and that I find it very alarming if an agent does not pay extra attention to the fact that it’s actions are leading to someone being harmed—I’d say people acting that way could potentially be very Unfriendly.
Although the post is titled “Ends Don’t Justify Means” it also carries that little thing in the parenthesis (Among Humans) … And it’s not like inability to generate better options is proper justification for taking action resulting into someone being harmed and other people not being harmed—even if it is the better of two evils. Or at least I find that in particular very “alarming”.
Humans have an intrinsic mode to perceive things as agents, but it’s not just our perception, instead sometimes things actually behave like agents—unless we consider the quite accurate anticipations often provided by models functioning on an agent basis a mere humane flaw. For the sake of simplicity let’s illustrate by saying that someone else finds the superior third option, but in the meanwhile this particular agent unable to find that particular third option, decides to go for the better outcome of sacrificing one to save five. In such a case it would be a mistake. It’s also taking a more active role in the causal chain of events influenced by agents.
Point being, I think it’s plausible to propose that a friendly AI would NOT make that decision, because it should not be in the position to make that decision, and therefore potential harm and tragedy occurring would not originate from the AI. I’m not saying that it’s the wrong decision, but certainly it should not be an obvious decision—unless this is what we’re really talking about.
People doing this I think is a problem because people suck at genuinely deciding based on the issues. I would rather live in a society where people were such that they could be trusted with the responsibility to push guys in front of trains if they had sufficient grounds to reasonably believe this was a genuine positive action. But knowing that people are not such, I would much rather they didn’t falsely believe they were, even if it sometimes causes suboptimal decisions in train scenarios.
In such a case it would be a mistake.
I don’t think you can automatically call a suboptimal decision a mistake.
This actually has a real-life equivalent, in the situation of having to shoot down a plane that is believed to be in the control of terrorists and flying towards a major city. I would not want to be in the position of that fighter pilot, but I would also want him to fire.
And I’m much more willing to trust a FAI with that call than any human.
I don’t think you can automatically call a suboptimal decision a mistake.
Huh? You wouldn’t call a decision that results in an unnecessary loss of life a mistake, but rather a suboptimal decision? Note that I altered the hypothetical situation in the comment and this “suboptimal decision” was labeled a mistake in the event that a 3rd party would come up with a superior decision (ie. one that would save all the lives)
And I’m much more willing to trust a FAI with that call than any human.
Edited:
There’s no FAI we can trust yet and this particular detail seems to be about the friendliness of an AI, so your belief seems a little out of place in this context, but nevermind that since if there were an actual FAI, I suppose I’d agree.
I think there’s potential for severe error in the logic present in the text of the post and I find it proper to criticize the substance of this post, despite it being 4 years old.
You wouldn’t call a decision that results in an unnecessary loss of life a mistake, but rather a suboptimal decision?
I might decide to take a general, consistent strategy due to my own limitations. In this example, the limitation is that if I feel justified in engaging in this sort of behavior on occasion, I will feel justified employing it on other occasions with insufficient justifications.
If I employed a different general strategy with a similar level of simplicity, it would be less optimal.
Other strategies exist that are closer to optimal, but my limitations preclude me from employing them.
I think there’s potential for severe error in the logic present in the text of the post
Of course there is. If you can show a specific error, that would be great.
This bit sounds a little alarming considering how much more seriously Eliezer has taken other kinds of AI problems before, for an example in this post.
I appreciate the straightforward logic of simply choosing the distinctly better option between two outcomes, but what this is lacking is the very automatic way for people to perceive things as agents and that I find it very alarming if an agent does not pay extra attention to the fact that it’s actions are leading to someone being harmed—I’d say people acting that way could potentially be very Unfriendly.
Although the post is titled “Ends Don’t Justify Means” it also carries that little thing in the parenthesis (Among Humans) … And it’s not like inability to generate better options is proper justification for taking action resulting into someone being harmed and other people not being harmed—even if it is the better of two evils. Or at least I find that in particular very “alarming”.
Humans have an intrinsic mode to perceive things as agents, but it’s not just our perception, instead sometimes things actually behave like agents—unless we consider the quite accurate anticipations often provided by models functioning on an agent basis a mere humane flaw. For the sake of simplicity let’s illustrate by saying that someone else finds the superior third option, but in the meanwhile this particular agent unable to find that particular third option, decides to go for the better outcome of sacrificing one to save five. In such a case it would be a mistake. It’s also taking a more active role in the causal chain of events influenced by agents.
Point being, I think it’s plausible to propose that a friendly AI would NOT make that decision, because it should not be in the position to make that decision, and therefore potential harm and tragedy occurring would not originate from the AI. I’m not saying that it’s the wrong decision, but certainly it should not be an obvious decision—unless this is what we’re really talking about.
People doing this I think is a problem because people suck at genuinely deciding based on the issues. I would rather live in a society where people were such that they could be trusted with the responsibility to push guys in front of trains if they had sufficient grounds to reasonably believe this was a genuine positive action. But knowing that people are not such, I would much rather they didn’t falsely believe they were, even if it sometimes causes suboptimal decisions in train scenarios.
I don’t think you can automatically call a suboptimal decision a mistake.
This actually has a real-life equivalent, in the situation of having to shoot down a plane that is believed to be in the control of terrorists and flying towards a major city. I would not want to be in the position of that fighter pilot, but I would also want him to fire.
And I’m much more willing to trust a FAI with that call than any human.
Huh? You wouldn’t call a decision that results in an unnecessary loss of life a mistake, but rather a suboptimal decision? Note that I altered the hypothetical situation in the comment and this “suboptimal decision” was labeled a mistake in the event that a 3rd party would come up with a superior decision (ie. one that would save all the lives)
Edited: There’s no FAI we can trust yet and this particular detail seems to be about the friendliness of an AI, so your belief seems a little out of place in this context, but nevermind that since if there were an actual FAI, I suppose I’d agree.
I think there’s potential for severe error in the logic present in the text of the post and I find it proper to criticize the substance of this post, despite it being 4 years old.
Anyway for an omniscient being not putting any weight on the potential of error would seem reasonable.
I might decide to take a general, consistent strategy due to my own limitations. In this example, the limitation is that if I feel justified in engaging in this sort of behavior on occasion, I will feel justified employing it on other occasions with insufficient justifications.
If I employed a different general strategy with a similar level of simplicity, it would be less optimal.
Other strategies exist that are closer to optimal, but my limitations preclude me from employing them.
Of course there is. If you can show a specific error, that would be great.