There have already in this thread been a lot of problems listed with this. I’m going to add just two more: consider an otherwise pretty friendly AI that is curious about the universe and wants to understand the laws of physics. No matter how much the AI learns, it will conclude that it and humans misunderstand the basic laws of physics. The AI will likely spend tremendous resources trying to understand just what is wrong with its understanding. And given the prior of 1, it will never resolve this issue.
Consider also the same scenario but if there’s an otherwise potentially friendly second AI that finds out about the way this other AI has been programmed. If this AI is at all close to what humans are like (again it is a mildly friendly AI) it will become paranoid about the possibility that there’s some similar programming issue in it. It might also use this as strong evidence that humans are jerks. The second AI isn’t going remain friendly for very long.
If this AI is at all close to what humans are like (again it is a mildly friendly AI) it will become paranoid about the possibility that there’s some similar programming issue in it”
AI would notice it anyway. Given some broken enough design it might be unable to care about that flaw, but if that’s the case, it won’t go paranoid over it. It just doesn’t care.
Of course, if we break the design even more, we might get an AI that tries to combine unified theory of physics with the “fact” that red wire actually doesn’t kill itself, results of that would probably be worth their own comic series. That sort of AI then again is probably broken enough to be next to useless, but still extremely dangerous piece of computing power. It would probably explode hilariously too if it could understand the analogy between itself and the crippled AI we’re discussing here, and actually care about that.
For your second AI, it is worth distinguishing between “friendly” and “Friendly”—it is Friendly, in the sense that it understands and appreciates the relatively narrow target that is human morality, it just is unimpressed with humans as allies.
That’s a valid distinction. But from the perspective of serious existential risks, an AI that has a similar morality but really doesn’t like humans has almost as much potential existential risk as an Unfriendly AI.
There have already in this thread been a lot of problems listed with this. I’m going to add just two more: consider an otherwise pretty friendly AI that is curious about the universe and wants to understand the laws of physics. No matter how much the AI learns, it will conclude that it and humans misunderstand the basic laws of physics. The AI will likely spend tremendous resources trying to understand just what is wrong with its understanding. And given the prior of 1, it will never resolve this issue.
Consider also the same scenario but if there’s an otherwise potentially friendly second AI that finds out about the way this other AI has been programmed. If this AI is at all close to what humans are like (again it is a mildly friendly AI) it will become paranoid about the possibility that there’s some similar programming issue in it. It might also use this as strong evidence that humans are jerks. The second AI isn’t going remain friendly for very long.
If an AI doesn’t rapidly come to this conclusion after less than thirty minutes of internet access it has a serious design flaw, no? :-)
AI would notice it anyway. Given some broken enough design it might be unable to care about that flaw, but if that’s the case, it won’t go paranoid over it. It just doesn’t care.
Of course, if we break the design even more, we might get an AI that tries to combine unified theory of physics with the “fact” that red wire actually doesn’t kill itself, results of that would probably be worth their own comic series. That sort of AI then again is probably broken enough to be next to useless, but still extremely dangerous piece of computing power. It would probably explode hilariously too if it could understand the analogy between itself and the crippled AI we’re discussing here, and actually care about that.
For your second AI, it is worth distinguishing between “friendly” and “Friendly”—it is Friendly, in the sense that it understands and appreciates the relatively narrow target that is human morality, it just is unimpressed with humans as allies.
That’s a valid distinction. But from the perspective of serious existential risks, an AI that has a similar morality but really doesn’t like humans has almost as much potential existential risk as an Unfriendly AI.
I agree.