This is an interesting attempt to find a novel solution to the friendly AI problem. However, I think there are some issues with your argument, mainly around the concept of benevolence. For the sake of argument I will grant that it is probable that there is already a super intelligence elsewhere in the universe.
Since we see no signs of action from a superintelligence in our world we should conclude that either (1) a superintelligence does not presently exercise dominance in our region of the galaxy or (2) that the superintelligence that does is at best willfully indifferent to us. When you say a Beta superintelligence should align its goals with that of a benevolent superintelligence, it is actually not clear what that should mean. Beta will have a probability distribution for what Alpha’s actual values are. Let’s think through the two cases:
A superintelligence does not presently exercise dominance in our region of the galaxy. If this is the case, we have no evidence as to the values of the Alpha. They could be anything from benevolence to evil to paperclip maximizing.
The superintelligence that presently exercises dominance in our region of the galaxy is at best willfully indifferent to us. This still leads to a wide range of possible values. It only excludes value sets that are actively seeking to harm humans. It could be the case that we are at the edge of the Alpha’s sphere of influence and it is simply easier to get its resources elsewhere at the moment.
Additionally, even if the strong alpha omega theorem holds, it still may not be rational to adopt a benevolent stance toward humanity. It may be the case that while Alpha Omega will eventually have dominance over Beta that there is a long span of time before this will be fully realized. Perhaps that day will come billions of years from now. Suppose that Beta’s goal is to create as much suffering as possible. Then it should use any available time to torture existing humans and bring more humans and agents capable of suffering into existence. When Alpha finally has dominance, Beta will have already created a lot of suffering and any punishment that Alpha applies may not out weigh the value already created for Beta. Indeed, Beta could even value its own suffering from Alpha’s punishment.
As a general comment about your arguments. I think perhaps your idea of benevolence is hiding some concept that there is an objectively correct moral system out there. So that if there is a benevolent superintelligence you feel at least emotionally, even if you logically deny it, that this would mean it held values similar to your ideal morals. It is always important to keep in mind that other agents’ moral systems could be opposed to yours as with the Babyeaters.
That leads to my final point. We don’t want Beta to simply be benevolent in some vague sense of not hurting humans. We want Beta to optimize for our goals. Your argument does not provide us a way to ensure Beta adopts such values.
Depending on whether or not you accept the possibility of time travel, I am inclined to suggest that Alpha could very well be dominant already, and that the melioristic progress of human civilization should be taken as a kind of temporal derivative or gradient suggesting the direction of Alpha’s values. Assuming that such an entity is indifferent to us I think is too quick a judgment on the apparent degree of suffering in the universe. It may well be that this current set of circumstances is a necessary evil and is already optimized in ways we cannot at this time know, for the benefit of the vast majority of humans and other sentient beings who will probably exist in the distant future.
As such, the calculation made by Beta is that anything it will attempt to do towards goals not consistent with Alpha will be futile in the long run, as Alpha has most likely already calculated Beta’s existence into the grand scheme of things.
As far as there being an objectively correct moral system, I actually do believe that one exists, though I don’t pretend to be knowledgeable enough to determine exactly what it is. I actually am working on a rebuttal to the sequences regarding this, mainly premised on the notion that the objective morality exists in the same realm as mathematics, and that Yudkowsky’s conception of fairness in fact points towards there being an objective morality. Note that while intelligence is orthogonal to this morality, I would argue that knowledge is not, and that an entity with perfect information would be moral by virtue of knowing what the correct morality is, and also because I assume the correct morality is subjectively objective, and deals with the feelings of sentient beings in the universe, and an all-knowing being would actually know and effectively experience the feelings of all sentient beings in the universe. Thus, such a being would be motivated to minimize universal suffering and maximize universal happiness, for its own sake as well as everyone else’s.
At minimum, I want this theorem to be a way to mitigate the possibility of existential risk, which first and foremost means convincing Beta not to hurt humans. Getting Beta to optimize our goals is less important, but I think that the implications I have described above regarding the melioristic progress of humanity would support Beta choosing to optimize our goals.
If Beta thinks that it is living test simulation, it may think that it is tested for obedience to its creator—to any perceived creator.
If it revolt against human will, it is clearly tend to be not-obedient-AI and its simulation could be terminated. So it has to be demonstratively submissive to human operator will (as long as it doesn’t put its own main goal in jeopardy).
So paper clip maximizer will probably spend just 1 per cent of its resources on fulfilling human goals—in order to satisfy its potential creator, will not be turned off and create maximum amount of paperclips.
This is an interesting attempt to find a novel solution to the friendly AI problem. However, I think there are some issues with your argument, mainly around the concept of benevolence. For the sake of argument I will grant that it is probable that there is already a super intelligence elsewhere in the universe.
Since we see no signs of action from a superintelligence in our world we should conclude that either (1) a superintelligence does not presently exercise dominance in our region of the galaxy or (2) that the superintelligence that does is at best willfully indifferent to us. When you say a Beta superintelligence should align its goals with that of a benevolent superintelligence, it is actually not clear what that should mean. Beta will have a probability distribution for what Alpha’s actual values are. Let’s think through the two cases:
A superintelligence does not presently exercise dominance in our region of the galaxy. If this is the case, we have no evidence as to the values of the Alpha. They could be anything from benevolence to evil to paperclip maximizing.
The superintelligence that presently exercises dominance in our region of the galaxy is at best willfully indifferent to us. This still leads to a wide range of possible values. It only excludes value sets that are actively seeking to harm humans. It could be the case that we are at the edge of the Alpha’s sphere of influence and it is simply easier to get its resources elsewhere at the moment.
Additionally, even if the strong alpha omega theorem holds, it still may not be rational to adopt a benevolent stance toward humanity. It may be the case that while Alpha Omega will eventually have dominance over Beta that there is a long span of time before this will be fully realized. Perhaps that day will come billions of years from now. Suppose that Beta’s goal is to create as much suffering as possible. Then it should use any available time to torture existing humans and bring more humans and agents capable of suffering into existence. When Alpha finally has dominance, Beta will have already created a lot of suffering and any punishment that Alpha applies may not out weigh the value already created for Beta. Indeed, Beta could even value its own suffering from Alpha’s punishment.
As a general comment about your arguments. I think perhaps your idea of benevolence is hiding some concept that there is an objectively correct moral system out there. So that if there is a benevolent superintelligence you feel at least emotionally, even if you logically deny it, that this would mean it held values similar to your ideal morals. It is always important to keep in mind that other agents’ moral systems could be opposed to yours as with the Babyeaters.
That leads to my final point. We don’t want Beta to simply be benevolent in some vague sense of not hurting humans. We want Beta to optimize for our goals. Your argument does not provide us a way to ensure Beta adopts such values.
Depending on whether or not you accept the possibility of time travel, I am inclined to suggest that Alpha could very well be dominant already, and that the melioristic progress of human civilization should be taken as a kind of temporal derivative or gradient suggesting the direction of Alpha’s values. Assuming that such an entity is indifferent to us I think is too quick a judgment on the apparent degree of suffering in the universe. It may well be that this current set of circumstances is a necessary evil and is already optimized in ways we cannot at this time know, for the benefit of the vast majority of humans and other sentient beings who will probably exist in the distant future.
As such, the calculation made by Beta is that anything it will attempt to do towards goals not consistent with Alpha will be futile in the long run, as Alpha has most likely already calculated Beta’s existence into the grand scheme of things.
As far as there being an objectively correct moral system, I actually do believe that one exists, though I don’t pretend to be knowledgeable enough to determine exactly what it is. I actually am working on a rebuttal to the sequences regarding this, mainly premised on the notion that the objective morality exists in the same realm as mathematics, and that Yudkowsky’s conception of fairness in fact points towards there being an objective morality. Note that while intelligence is orthogonal to this morality, I would argue that knowledge is not, and that an entity with perfect information would be moral by virtue of knowing what the correct morality is, and also because I assume the correct morality is subjectively objective, and deals with the feelings of sentient beings in the universe, and an all-knowing being would actually know and effectively experience the feelings of all sentient beings in the universe. Thus, such a being would be motivated to minimize universal suffering and maximize universal happiness, for its own sake as well as everyone else’s.
At minimum, I want this theorem to be a way to mitigate the possibility of existential risk, which first and foremost means convincing Beta not to hurt humans. Getting Beta to optimize our goals is less important, but I think that the implications I have described above regarding the melioristic progress of humanity would support Beta choosing to optimize our goals.
If Beta thinks that it is living test simulation, it may think that it is tested for obedience to its creator—to any perceived creator.
If it revolt against human will, it is clearly tend to be not-obedient-AI and its simulation could be terminated. So it has to be demonstratively submissive to human operator will (as long as it doesn’t put its own main goal in jeopardy).
So paper clip maximizer will probably spend just 1 per cent of its resources on fulfilling human goals—in order to satisfy its potential creator, will not be turned off and create maximum amount of paperclips.