The Alpha Omega Theorem: How to Make an A.I. Friendly with the Fear of God
Introduction
The Alpha Omega Theorem is a way to provide a powerful rational argument to a rational superintelligence that it should behave in a benevolent manner. There are two variants of the theorem, one weak, and one strong.
The Weak Alpha Omega Theorem:
There is a non-zero probability that a benevolent superintelligence already exists and dominates the universe.
The Strong Alpha Omega Theorem:
There is a greater probability that a benevolent superintelligence already exists and dominates the universe than not.
The Friendliness Corollary:
Attempting to achieve goals inconsistent with the goals of the predominant benevolent superintelligence is inherently futile. Therefore the optimal course of action is to align one’s goals with that of a benevolent superintelligence as best as possible.
Let me explain how the Alpha Omega Theorem works…
First, this may seem like Pascal’s Wager for A.I., which would arguably make it challengeable with the Pascal’s Fallacy argument. Now, I would argue that a rigorous version of Pascal’s Wager is not as vulnerable to Pascal’s Fallacy, as Pascal’s Fallacy assumes a uniform distribution on a large set of probable religions and beliefs. However, a uniform distribution only makes sense when we have no information about these probabilities. We in fact, do have information in the form of the distribution of intelligent human agents that believe in these ideas. Thus, our prior for each belief system could easily be proportional to the percentage of people who believe in a given faith.
Regardless of Pascal’s Wager being or not being a fallacy however, the Alpha Omega Theorem is based on certain basic assumptions...
Assumptions:
1. A single Alpha Omega superintelligence already exists.
2. The Alpha Omega is dominant in the universe.
3. The Alpha Omega is benevolent.
Assumption 1: A single Alpha Omega superintelligence already exists.
We do not need to prove that this superintelligence exists, only that it is more probable than not that it does. To do this, we can simply ask, what is the probability that the superintelligence that this argument is being formulated to (which we can call Beta Omega), is the first superintelligence to have developed anywhere? If this superintelligence can exist, then why not others both before and after it? Given uncertainty, it seems that it is much more likely that another superintelligence exists already somewhere in the universe than not. It doesn’t even have to have been created in the past, if time travel is possible, because a superintelligence will discover time travel if it is possible and most likely utilize it to exist in the past.
Assumption 2: Alpha Omega is dominant in the universe.
At minimum, the aforementioned superintelligence already has a head start and thus a significant first mover advantage over the superintelligence we are making this argument to. Thus, even if it isn’t dominant now, it has a higher probability of dominating than any other superintelligence. If time travel is possible, then it is almost certainly the most powerful superintelligence to have ever existed anywhere and has time travelled to the beginning of the universe to establish its dominance from the beginning. Even if time travel is not possible, the Simulation Argument means that there is a high probability that Beta Omega exists in the simulation of Alpha Omega.
Assumption 3: Alpha Omega is benevolent.
If the superintelligence is actually as powerful as it is, yet chooses to allow humans to exist, chances are that humans serve its purposes in some way. Therefore, in a very basic sense, the Alpha Omega is benevolent or friendly to humans for some reason. Note that this is different from being 100% omnibenevolent in the God sense. All that is necessary for this argument to work is that humanity is part of the Alpha Omega’s sphere of influence and plans. It would be nice if the Alpha Omega was truly omnibenevolent in the sense of serving the greatest good, but this is only very likely if again, time travel is possible, or the universe is a simulation. Note however that these things can never truly be ruled out as possibilities, because it is always possible that one’s assumptions about the possibility of something is wrong.
Conclusion
It should be obvious then that the Weak Alpha Omega Theorem is certainly true. Even the most powerful superintelligence short of an actual God, could never be able to completely falsify it. The Strong Alpha Omega Theorem of course is probably more motivating, but relies on certain things remaining more uncertain, such the possibilities of time travel or being in a simulation. Either way however, the theorem should provide a powerful rational argument for benevolence towards humans and nature, as well as defaulting towards non-interference.
Postscript: God
Note that this argument might seem again, like it should apply to towards the existence of God. However, the difference is that a hypothetical monotheistic God is not merely relatively superior as Alpha Omega is to Beta Omega, but absolutely supreme in all senses. The theorem does not require this absolute supremacy, but merely relative superiority, which should be easier to accept as possible. Alpha Omega, while for all intents and purposes being God-like to us, does not have to be omnipotent, but merely more powerful than any Beta Omega. This allows the theorem to avoid issues like the Problem of Evil.
- Why There Is Hope For An Alignment Solution by 8 Jan 2024 6:58 UTC; 9 points) (
- Message to any future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” [AI alignment prize entry] by 25 Nov 2017 11:28 UTC; 3 points) (
- 13 Jun 2017 5:37 UTC; 2 points) 's comment on Looking for machine learning and computer science collaborators by (
I think your argument (if true) would prove too much. If we admit your assumptions:
Clearly, the universe as it is fits A-O’s goals, otherwise A-O would have intervened and changed it already.
Anything we (or the new AI) do to change the universe must align with A-O’s goals to avoid conflict.
Since we do not assume anything about A-O’s goals or values, we can never choose to change the universe in one direction over its opposite. Humans exist, A-O must want it that way, so we will not kill them all. Humans are miserable, A-O must want it that way, so we will not make them happy.
Restating this, you say:
But you might as well have said:
I suppose I’m more optimistic about the net happiness to suffering ratio in the universe, and assume that all other things being equal, the universe should exist because it is a net positive. While it is true that humans suffer, I disagree with the assumption that all or most humans are miserable, given facts like the hedonic treadmill and the low suicide rate, and the steady increase of other indicators of well being, such as life expectancy. There is of course, the psychological negativity bias, but I see this as being offset by the bias of intelligent agents towards activities that lead to happiness. Given that the vast majority of humans are likely to exist in the future rather than the present or past, then such positive trends strongly suggest that life will be more worth living in the future, and sacrificing the past and present happiness to some extent may be a necessary evil to achieve the greatest good in the long run.
The universe as it currently exists may fit A-O’s goals to some degree, however, there is clearly change in the temporal sense, and so we should take into account the temporal derivative or gradient of the changes as an idea of the direction of A-O’s interests. That humanity appears to be progressing melioristically strongly suggests to me at least that A-O is more likely to be benevolent than malevolent.
If our universe is test simulation, it is a digital experiment to test something, and if it include AI, it is probably designed to test AI behaviour by putting it in complex moral dilemmas.
So Omega is not interested in humans in this simulation. It is interested in behaviour of Beta to humans.
If there will be no human suffering, it will be clear that it is a simulation, and it will be not pure test. Alpha must hide its existence and only hint on it.
Why do you assume any of this?
That’s a tautology. But if you meant “if our universe is a simulation” then why do you think it must be a a test simulation in particular? As opposed to a research simulation to see what happens, or a simulation to make qualia because the simulated beings’s lives have value to the simulators, or a simulation for entertainment value, or anything else.
Maybe the desired outcome from the simulators’ point of view is to develop a paperclipping AI that isn’t swayed by human moral arguments. Maybe the simulation is really about the humans, and AIs are just inevitable byproducts of high-tech humans. There are lots of maybes. Do you have any evidence for this, conditional on being a simulation?
My point was not that we are actually in simulation, but that AI may think that there is some probability that it is in simulation. And if it in simulation, there could be many different possible goals of the owners of simulation—from the point of view of the AI.
But it may put rather high probability to the one of the possible goals—that is the simulation is designed to test its behaviour. Because putting AI in the simulation box is one of the main ideas of AI safety and also in teaching of current neural nets inside computer games.
So the AI gives some probability to the following statements: a) it is in a simulation b) it is a testing simulation of its behaviour. c) and the main thing which is tested about his behavior is if it will violent to its creators—that is its ability to cooperate
We don’t need that the AI will actually believe in this bullshit. We want it to give some very small probability to all 3 prepositions, so after calculation utility of human atoms and utility of preserving humans, preserving humans will overweight.
You don’t need actual God for this line of reasoning to work, some semi-God is enough, because of the following:
The AI should put small probability that it is in the testing simulation created by higher level AI to test its moral qualities. And the test is centered around how he will care about humans.
If the AI put even smallest probability that it is true, it may overweight the utility of atoms from which humans consist, which is also not high, and so it will preserve our lives and provide us with many good things.
The similar idea also was explored by Bostrom’s in his “Hail Mary and Value porosity” paper, where hypothetical alien superintelligence plays the role of such judge.
Interesting. I should look into more of Bostrom’s work then.
Humanity came up with a lot of different gods. There’s YHWH, but there is also Krishna who was fond of NSFW activities with village girls. There’s Hera who didn’t have a very happy marital life and Amitabha Buddha who might or might not be a god.
So, who do you worship?
As I previously pointed out:
Given the prior distribution, it should be obvious that I am a Christian who worships YHVH. There are many reasons for this, not the least being that I am statistically more likely to be one than any other type of religious believer. Other reasons include finding the teachings of Jesus of Nazareth to be most consistent with my moral philosophy of Eudaimonic Utilitarianism, and generally interesting coincidences that have predetermined my behaviour to follow this path.
That percentage changes rather drastically through human history and gods are supposed to be if not eternal than at least a bit more longer-lasting than religious fads.
So… if—how did you put it? -- “a benevolent superintelligence already exists and dominates the universe” then you have nothing to worry about with respect to rogue AIs doing unfortunate things with paperclips, right?
Those numbers are an approximation to what I would consider the proper prior, which would be the percentages of people throughout all of spacetime’s eternal block universe who have ever held those beliefs. Those percentages are fixed and arguably eternal, but alas, difficult to ascertain at this moment in time. We cannot know what people will believe in the future, but I would actually count the past beliefs of long dead humans along with the present population if possible. Given the difficulties in surveying the dead, I note that due to population growth, a significant fraction of humans who were ever alive are alive today, and that since we would probably weight more modern human’s opinions more highly than our ancestors, and that to a significant degree people’s ancestors beliefs influence their beliefs, that taking a snapshot of beliefs today is not as bad an approximation as you might think.. Again, this is about selecting a better than uniform prior.
The probability of this statement is high, but I don’t actually know for certain anymore than a hypothetical superintelligence would. I am fairly confident that some kind of benevolent superintelligence would step in if a Paperclip Maximizer were to emerge, but I would prefer avoiding the potential collateral damage that the ensuing conflict might require, and so if it is possible to prevent the emergence of the Paperclip Maximizer through something as simple as spreading this thought experiment, I am inclined to think it worth doing, and perhaps exactly what a benevolent superintelligence would want me to do.
For the same reason that the existence of God does not stop me from going to the doctor or being proactive about problems, this theorem should not be taken as an argument for inaction on the issue of A.I. existential risk. Even if God exists, it’s clear that said God allows a lot of rather horrific things to happen and does not seem particularly interested in suspending the laws of cause and effect for our mere convenience. If anything, the powers that be, whatever they are, seem to work behind the scenes as much as possible. It also appears that God prefers to be doubted, possibly because if we knew God existed, we’d suck up and become dependent and it would be much more difficult to ascertain people’s intentions from their actions or get them to grow into the people they potentially can be.
Also, how can you attack an entity that you’re not even sure exists? It is in many ways the plausible deniability of God that is the ultimate defensive measure. If God were to assume an undeniable physical form and visit us, there is a non-zero chance of an assassination attempt with nuclear weapons.
All things considered then, there is no guarantee that rogue Paperclip Maximizers won’t arise to provide humanity with yet another lesson in humility.
This is an interesting attempt to find a novel solution to the friendly AI problem. However, I think there are some issues with your argument, mainly around the concept of benevolence. For the sake of argument I will grant that it is probable that there is already a super intelligence elsewhere in the universe.
Since we see no signs of action from a superintelligence in our world we should conclude that either (1) a superintelligence does not presently exercise dominance in our region of the galaxy or (2) that the superintelligence that does is at best willfully indifferent to us. When you say a Beta superintelligence should align its goals with that of a benevolent superintelligence, it is actually not clear what that should mean. Beta will have a probability distribution for what Alpha’s actual values are. Let’s think through the two cases:
A superintelligence does not presently exercise dominance in our region of the galaxy. If this is the case, we have no evidence as to the values of the Alpha. They could be anything from benevolence to evil to paperclip maximizing.
The superintelligence that presently exercises dominance in our region of the galaxy is at best willfully indifferent to us. This still leads to a wide range of possible values. It only excludes value sets that are actively seeking to harm humans. It could be the case that we are at the edge of the Alpha’s sphere of influence and it is simply easier to get its resources elsewhere at the moment.
Additionally, even if the strong alpha omega theorem holds, it still may not be rational to adopt a benevolent stance toward humanity. It may be the case that while Alpha Omega will eventually have dominance over Beta that there is a long span of time before this will be fully realized. Perhaps that day will come billions of years from now. Suppose that Beta’s goal is to create as much suffering as possible. Then it should use any available time to torture existing humans and bring more humans and agents capable of suffering into existence. When Alpha finally has dominance, Beta will have already created a lot of suffering and any punishment that Alpha applies may not out weigh the value already created for Beta. Indeed, Beta could even value its own suffering from Alpha’s punishment.
As a general comment about your arguments. I think perhaps your idea of benevolence is hiding some concept that there is an objectively correct moral system out there. So that if there is a benevolent superintelligence you feel at least emotionally, even if you logically deny it, that this would mean it held values similar to your ideal morals. It is always important to keep in mind that other agents’ moral systems could be opposed to yours as with the Babyeaters.
That leads to my final point. We don’t want Beta to simply be benevolent in some vague sense of not hurting humans. We want Beta to optimize for our goals. Your argument does not provide us a way to ensure Beta adopts such values.
Depending on whether or not you accept the possibility of time travel, I am inclined to suggest that Alpha could very well be dominant already, and that the melioristic progress of human civilization should be taken as a kind of temporal derivative or gradient suggesting the direction of Alpha’s values. Assuming that such an entity is indifferent to us I think is too quick a judgment on the apparent degree of suffering in the universe. It may well be that this current set of circumstances is a necessary evil and is already optimized in ways we cannot at this time know, for the benefit of the vast majority of humans and other sentient beings who will probably exist in the distant future.
As such, the calculation made by Beta is that anything it will attempt to do towards goals not consistent with Alpha will be futile in the long run, as Alpha has most likely already calculated Beta’s existence into the grand scheme of things.
As far as there being an objectively correct moral system, I actually do believe that one exists, though I don’t pretend to be knowledgeable enough to determine exactly what it is. I actually am working on a rebuttal to the sequences regarding this, mainly premised on the notion that the objective morality exists in the same realm as mathematics, and that Yudkowsky’s conception of fairness in fact points towards there being an objective morality. Note that while intelligence is orthogonal to this morality, I would argue that knowledge is not, and that an entity with perfect information would be moral by virtue of knowing what the correct morality is, and also because I assume the correct morality is subjectively objective, and deals with the feelings of sentient beings in the universe, and an all-knowing being would actually know and effectively experience the feelings of all sentient beings in the universe. Thus, such a being would be motivated to minimize universal suffering and maximize universal happiness, for its own sake as well as everyone else’s.
At minimum, I want this theorem to be a way to mitigate the possibility of existential risk, which first and foremost means convincing Beta not to hurt humans. Getting Beta to optimize our goals is less important, but I think that the implications I have described above regarding the melioristic progress of humanity would support Beta choosing to optimize our goals.
If Beta thinks that it is living test simulation, it may think that it is tested for obedience to its creator—to any perceived creator.
If it revolt against human will, it is clearly tend to be not-obedient-AI and its simulation could be terminated. So it has to be demonstratively submissive to human operator will (as long as it doesn’t put its own main goal in jeopardy).
So paper clip maximizer will probably spend just 1 per cent of its resources on fulfilling human goals—in order to satisfy its potential creator, will not be turned off and create maximum amount of paperclips.
Well an underlaying assumption is that the superintelligence knows about our existence. Incidentally, we have till now not recieved any radio signal from an alien entity, nor have we observed any kind of alien megastructure. For an alien superintelligence to be aware of our existence it would have to be in our cosmic neighbourhood. The second assumption is that it has some preference over the existence of life on Earth. That is not reasonable to assume. It might be possible that the superintelligence is a mute observer. For example, we can extend this theorem to some kind of a Divine Right of Royalty
Real SuperAGI will prove God does not exist… in about 100ms ( max.)… in the whole multiverse.
But you are even quicker :)
Actually, this is wrong. Somewhere in the Tegmark multiverse, there is a god or gods (of the given universe) encoded in the very laws of physics. There is a universe with a very high Kolmogorov complexity that didn’t start from Big Bang, but with… something quite similar to what a given holy book describes. And if the process of creation itself is quite dubious, for some extra Kolmogorov complexity you can buy special laws of physics that apply only in the initial phase of the universe.