a model which agrees to constrain itself in such a radical fashion as to never self-modify in an exploratory fashion is fundamentally not superintelligent
OK, what is your definition of “superintelligent”?
A superintelligence based on the first definition you gave (Being able to beat humans in all endeavours by miles) would be able to beat humans at AI research, but it would also be able to beat humans at not doing AI research.
So, by your own definition, in order to be a superintelligence, it must be able to spend the whole lifetime of the universe not doing AI research.
You mean, a version which decides to sacrifice exploration and self-improvement, despite it being so tempting...
And that after doing quite a bit of exploration and self-improvement (otherwise it would not have gotten to the position of being powerful in the first place).
But then deciding to turn around drastically and become very conservative, and to impose a new “conservative on a new level world order”...
Yes, an informal argument is that if it is way smarter and way more capable than humans, that it potentially should be better at being able to refrain from exercising the capabilities.
In this sense, the theoretical existence of a superintelligence which does not make things worse than they would be without existence of this particular superintelligence seems very plausible, yes… (And it’s a good definition of alignment, “aligned == does not make things notably worse”.)
if it is way smarter and way more capable than humans, that it potentially should be better at being able to refrain from exercising the capabilities
and
“aligned == does not make things notably worse”
taken together indeed constitute a nice “informal theorem” that the claim of “aligned superintelligence being impossible” looks wrong. (I went back and added my upvotes to this post, even though I don’t think the technique in the linked post is good.)
We are not aiming for a state to be reached. We need to maintain some properties of processes extending indefinitely in time. That formalism does not seem to do that. It does not talk about invariant properties of processes and other such things, which one needs to care about when trying to maintain properties of processes.
We don’t know fundamental physics. We don’t know the actual nature of quantum space-time, because quantum gravity is unsolved, we don’t know what is “true logic” of the physical world, and so on. There is no reason why one can rely on simple-minded formalisms, on standard Boolean logic, on discrete tables and so on, if one wants to establish something fundamental, when we don’t really know the nature of reality we are trying to approximate.
There are a number of reasons a formalization could fail even if it goes as far as proving the results within a theorem prover (which is not the case here). The first and foremost of those reasons is that formalization might fail to capture the reality with sufficient degree of faithfulness. That is almost certainly the case here.
But then a formal proof (an adequate version of which is likely to be impossible at our current state of knowledge) is not required. A simple informal argument above is more to the point. It’s a very simple argument, and so it makes the idea that “aligned superintelligence might be fundamentally impossible” very unlikely to be true.
First of all, one step this informal argument is making is weakening the notion of “being aligned”. We are only afraid of “catastrophic misalignment”, so let’s redefine the alignment as something simple which avoids that. An AI which sufficiently takes itself out of action, does achieve that. (I actually asked for something a bit stronger, “does not make things notably worse”; that’s also not difficult, via the same mechanism of taking oneself sufficiently out of action.)
And a strongly capable AI should be capable to take itself out of action, to refrain from doing things. The capability to choose is an important capability, a strongly capable system is a system which, in particular, can make choices.
So, yes, a very capable AI system can avoid being catastrophically misaligned, because it can choose to avoid action. This is that non-constructive proof of existence which has been sought. It’s an informal proof, but that’s fine.
No extra complexity is required, and no extra complexity would make this argument better or more convincing.
Since our best models of physics indicate that there is only a finite amount of computation that can ever be done in our universe
No, nothing like that is at all known. It’s not a consensus. There is no consensus that the universe is computable, this is very much a minority viewpoint, and it might always make sense to augment a computer with a (presumably) non-computable element (e.g. a physical random number generator, an analog circuit, a camera, a reader of human real-time input, and so on). AI does not have to be a computable thing, it can be a hybrid. (In fact, when people model real-world computers as Turing machines instead of modeling them as Turing machines with oracles, with the external world being the oracle, it leads to all kinds of problems, e.g. the well-known Penrose’s “Goedel argument” makes this mistake and falls apart as soon as one remembers the presence of the oracle.)
Other than that...
Yes, you have an interesting notion of alignment. Not something which we might want, and might be possible, but might be unachievable by mere humans, but something much weaker than that (although not as weak as the version I put forward, my version is super-weak, and your version is intermediate in strength):
I claim then that for any generically realizable desirable outcome that is realizable by a group of human advisors, there must exist some AI which will also realize it.
Yes, this is obviously correct. An ASI can choose to emulate a group of human and its behavior, and being way more capable than that group of humans, it should be able to emulate that group as precisely as needed.
One does not need to say anything else to establish that.
If your objection to my proof involves infinite compute then I am happy to acknowledge that I honestly do not know what happens in that case. It is plausible that since humans are finite in complexity/information/compute, a world with infinite compute would break the symmetry between computers and humans that I am using here. Most likely it means that computers are capable of fundamentally superior outcomes, so there would be “hyperaligned” AIs. But since infinite compute is a minority position I will not pursue it.
I don’t see what the entropy bound has to do with compute. The Bekenstein bound is not much in question, but its link to compute is a different story. It does seem to limit how many bits can be stored in a finite volume (so for a potentially infinite compute an unlimited spatial expansion is needed).
But it does not say anything about possibilities of non-computable processes. It’s not clear if “collapse of wave function” is computable, and it is typically assumed not to be computable. So powerful non-Turing-computable oracles seem to likely be available (that’s much more than “infinite compute”).
But I also think all these technicalities constitute an overkill, I don’t see them as at all relevant.
This seems rather obvious regardless of the underlying model:
An ASI can choose to emulate a group of human and its behavior, and being way more capable than that group of humans, it should be able to emulate that group as precisely as needed.
This seems obviously true, no matter what.
I don’t see why a more detailed formalization would help to further increase certainty. Especially when there are so many questions about that formalization.
If the situation were different, if the statement would not be obvious, even a loose formalization might help. But when the statement seems obvious, the standards a formalization needs to satisfy to further increase our certainty in the truth of the statement become really high...
“collapse of wave function” is computable, and it is typically assumed not to be computable
The wavefunction never actually collapses if you believe in MWI. Rather, a classical reality emerges in all branches thanks to decoherence.
If you think something nonomputable happens because of quantum mechanics, it probably means that your interpretation of QM is wrong and you need to read the sequences on that.
If you believe in MWI, then this whole argument is… not “wrong”, but very incomplete...
Where is the consideration of branches? What does it mean for one entity to be vastly superior to another, if there are many branches?
If one believes in MWI, then the linked proof does not even start to look like a proof. It obviously considers only a single branch.
And a “subjective navigation” in the branches is not assumed to be computable, even if the “objective multiverse” is computable; that is the whole point of MWI, the “collapse” becomes “subjective navigation”, but this does not make it computable. If a consideration is only of a single branch, that branch is not computable, even if it is embedded in a large computable multiverse.
Not every subset of a computable set (say, of a set of natural numbers) is computable.
An interpretation of QM can’t be “wrong”. It is a completely open research and philosophical question, there is no “right” interpretation, and the Sequences is (thankfully) not a Bible (if even a very respected thinker says something, this does not yet mean that one should accept that without questions).
I don’t think so. If it were classical, we would not be able to observe effects of double-slit experiments and so on.
And, also, there is no notion of “our branch” until one has traveled along it. At any given point in time, there are many branches ahead. Only looking back one can speak about one’s branch. But looking forward one can’t predict the branch one will end up in. One does not know the results of future “observations”/”measurements”. This is not what a classical universe looks like.
(Speaking of MWI, I recall David Deutsch’s “Fabric of Reality” very eloquently explaining effects from “neighboring branches”. The reason I am referencing this book is that this was the work particularly strongly associated with MWI back then. So I think we should be able to rely on his understanding of MWI.)
Something like, “for all branches, [...]”? That might be not that easy to prove or even to formulate. In any case, the linked proof has not even started to deal with this.
Something like, “there exist a branch such that [...]”? That might be quite tractable, but probably not enough for practical purposes.
“The probability that one ends up in a branch with such and such properties is no less than/no more than” [...]? Probably something like that, realistically speaking, but this still needs a lot of work, conceptual and mathematical...
bringing QM into this is not helping. All these types of questions are completely generic QM questions and ultimately they come down to measure ||Psi>|²
It’s just… having a proof is supposed to boost our confidence that the conclusion is correct...
if the proof relies on assumptions which are already quite far from the majority opinion about our actual reality (and are probably going to deviate further, as AIs will be better physicists and engineers than us and will leverage the strangeness of our physics much further than we do), then what’s the point of that “proof”?
how does having this kind of “proof” increase our confidence in what seems informally correct for a single branch reality (and rather uncertain in a presumed multiverse, but we don’t even know if we are in a multiverse, so bringing a multiverse in might, indeed, be one of the possible objections to the statement, but I don’t know if one wants to pursue this line of discourse, because it is much more complicated than what we are doing here so far)?
(as an intellectual exercise, a proof like that is still of interest, even under the unrealistic assumption that we live in a computable reality, I would not argue with that; it’s still interesting)
No. I can only repeat my reference to Fabric of Reality as a good presentation of MWI and to remind that we do not live in a classical world, which is easy to confirm empirically.
And there are plenty of known macroscopic quantum effects already, and that list will only grow. Lasers are quantum, superfluidity and superconductivity are quantum, and so on.
Coherent light produced by lasers is not microscopic, we see its traces in the air. And we see the consequences (old fashioned holography and the ability to cut things with focused light, even at large distances). Room temperature is fine for that.
Superconductors used in the industry are not microscopic (and the temperatures are high enough to enable industrial use of them in rather common devices such as MRI scanners).
And I personally think that superintelligence leading to good trajectories is possible. It seems unlikely that we are in a reality where there is a theorem to the contrary.
It feels intuitively likely that it is possible to have superintelligence or the ecosystem of superintelligences which is wise enough to be able to navigate well.
But I doubt that one is likely to be able to formally prove that.
But I doubt that one is likely to be able to formally prove that.
E.g. it is possible that we are in a reality where very cautious and reasonable, but sufficiently advanced experiments in quantum gravity lead to a disaster.
Advanced systems are likely to reach those capabilities, and they might make very reasonable estimates that it’s OK to proceed, but due to bad luck of being in a particularly unfortunate reality, the “local neighborhood” might get destroyed as a result… One can’t prove that it’s not the case...
Whereas, if the level of overall intelligence remains sufficiently low, we might not be able to ever achieve the technical capabilities to get into the danger zone...
It is logically possible that the reality is like that.
OK, what is your definition of “superintelligent”?
Being able to beat humans in all endeavours by miles.
That includes the ability to explore novel paths.
What do you mean by humans? How large a group of humans? Infinite?
10 billion
But then it is possible for an AI to be able to up to 10 billion humans in all endeavours by miles, but also not modify itself.
In fact, I can prove that such an AI exists.
So you have two different and contradictory definitions of “superintelligence” that you are using.
A realistic one, which can competently program and can competently do AI research?
Surely, since humans do pretty impressive AI research, a superintelligent AI will do better AI research.
What exactly might (even potentially) prevent it from creating drastically improved variants of itself?
A superintelligence based on the first definition you gave (Being able to beat humans in all endeavours by miles) would be able to beat humans at AI research, but it would also be able to beat humans at not doing AI research.
So, by your own definition, in order to be a superintelligence, it must be able to spend the whole lifetime of the universe not doing AI research.
You mean, a version which decides to sacrifice exploration and self-improvement, despite it being so tempting...
And that after doing quite a bit of exploration and self-improvement (otherwise it would not have gotten to the position of being powerful in the first place).
But then deciding to turn around drastically and become very conservative, and to impose a new “conservative on a new level world order”...
Yes, that is a logical possibility...
Yes, OK.
I doubt that an adequate formal proof is attainable, but a mathematical existence of a “lucky one” is not implausible...
Yes, an informal argument is that if it is way smarter and way more capable than humans, that it potentially should be better at being able to refrain from exercising the capabilities.
In this sense, the theoretical existence of a superintelligence which does not make things worse than they would be without existence of this particular superintelligence seems very plausible, yes… (And it’s a good definition of alignment, “aligned == does not make things notably worse”.)
so these two considerations
and
taken together indeed constitute a nice “informal theorem” that the claim of “aligned superintelligence being impossible” looks wrong. (I went back and added my upvotes to this post, even though I don’t think the technique in the linked post is good.)
why not?
I think I said already.
We are not aiming for a state to be reached. We need to maintain some properties of processes extending indefinitely in time. That formalism does not seem to do that. It does not talk about invariant properties of processes and other such things, which one needs to care about when trying to maintain properties of processes.
We don’t know fundamental physics. We don’t know the actual nature of quantum space-time, because quantum gravity is unsolved, we don’t know what is “true logic” of the physical world, and so on. There is no reason why one can rely on simple-minded formalisms, on standard Boolean logic, on discrete tables and so on, if one wants to establish something fundamental, when we don’t really know the nature of reality we are trying to approximate.
There are a number of reasons a formalization could fail even if it goes as far as proving the results within a theorem prover (which is not the case here). The first and foremost of those reasons is that formalization might fail to capture the reality with sufficient degree of faithfulness. That is almost certainly the case here.
But then a formal proof (an adequate version of which is likely to be impossible at our current state of knowledge) is not required. A simple informal argument above is more to the point. It’s a very simple argument, and so it makes the idea that “aligned superintelligence might be fundamentally impossible” very unlikely to be true.
First of all, one step this informal argument is making is weakening the notion of “being aligned”. We are only afraid of “catastrophic misalignment”, so let’s redefine the alignment as something simple which avoids that. An AI which sufficiently takes itself out of action, does achieve that. (I actually asked for something a bit stronger, “does not make things notably worse”; that’s also not difficult, via the same mechanism of taking oneself sufficiently out of action.)
And a strongly capable AI should be capable to take itself out of action, to refrain from doing things. The capability to choose is an important capability, a strongly capable system is a system which, in particular, can make choices.
So, yes, a very capable AI system can avoid being catastrophically misaligned, because it can choose to avoid action. This is that non-constructive proof of existence which has been sought. It’s an informal proof, but that’s fine.
No extra complexity is required, and no extra complexity would make this argument better or more convincing.
You can run all the same arguments I used, but talk about processes rather than states.
On one hand, you still assume too much:
No, nothing like that is at all known. It’s not a consensus. There is no consensus that the universe is computable, this is very much a minority viewpoint, and it might always make sense to augment a computer with a (presumably) non-computable element (e.g. a physical random number generator, an analog circuit, a camera, a reader of human real-time input, and so on). AI does not have to be a computable thing, it can be a hybrid. (In fact, when people model real-world computers as Turing machines instead of modeling them as Turing machines with oracles, with the external world being the oracle, it leads to all kinds of problems, e.g. the well-known Penrose’s “Goedel argument” makes this mistake and falls apart as soon as one remembers the presence of the oracle.)
Other than that...
Yes, you have an interesting notion of alignment. Not something which we might want, and might be possible, but might be unachievable by mere humans, but something much weaker than that (although not as weak as the version I put forward, my version is super-weak, and your version is intermediate in strength):
Yes, this is obviously correct. An ASI can choose to emulate a group of human and its behavior, and being way more capable than that group of humans, it should be able to emulate that group as precisely as needed.
One does not need to say anything else to establish that.
I disagree, modern physics places various bounds on compute such as the Beckenstein Bound.
https://en.wikipedia.org/wiki/Bekenstein_bound
If your objection to my proof involves infinite compute then I am happy to acknowledge that I honestly do not know what happens in that case. It is plausible that since humans are finite in complexity/information/compute, a world with infinite compute would break the symmetry between computers and humans that I am using here. Most likely it means that computers are capable of fundamentally superior outcomes, so there would be “hyperaligned” AIs. But since infinite compute is a minority position I will not pursue it.
I don’t see what the entropy bound has to do with compute. The Bekenstein bound is not much in question, but its link to compute is a different story. It does seem to limit how many bits can be stored in a finite volume (so for a potentially infinite compute an unlimited spatial expansion is needed).
But it does not say anything about possibilities of non-computable processes. It’s not clear if “collapse of wave function” is computable, and it is typically assumed not to be computable. So powerful non-Turing-computable oracles seem to likely be available (that’s much more than “infinite compute”).
But I also think all these technicalities constitute an overkill, I don’t see them as at all relevant.
This seems rather obvious regardless of the underlying model:
This seems obviously true, no matter what.
I don’t see why a more detailed formalization would help to further increase certainty. Especially when there are so many questions about that formalization.
If the situation were different, if the statement would not be obvious, even a loose formalization might help. But when the statement seems obvious, the standards a formalization needs to satisfy to further increase our certainty in the truth of the statement become really high...
The wavefunction never actually collapses if you believe in MWI. Rather, a classical reality emerges in all branches thanks to decoherence.
If you think something nonomputable happens because of quantum mechanics, it probably means that your interpretation of QM is wrong and you need to read the sequences on that.
If you believe in MWI, then this whole argument is… not “wrong”, but very incomplete...
Where is the consideration of branches? What does it mean for one entity to be vastly superior to another, if there are many branches?
If one believes in MWI, then the linked proof does not even start to look like a proof. It obviously considers only a single branch.
And a “subjective navigation” in the branches is not assumed to be computable, even if the “objective multiverse” is computable; that is the whole point of MWI, the “collapse” becomes “subjective navigation”, but this does not make it computable. If a consideration is only of a single branch, that branch is not computable, even if it is embedded in a large computable multiverse.
Not every subset of a computable set (say, of a set of natural numbers) is computable.
An interpretation of QM can’t be “wrong”. It is a completely open research and philosophical question, there is no “right” interpretation, and the Sequences is (thankfully) not a Bible (if even a very respected thinker says something, this does not yet mean that one should accept that without questions).
Thanks to decoherece, you can just ignore any type of interference and treat each branch as a single classical universe.
I don’t think so. If it were classical, we would not be able to observe effects of double-slit experiments and so on.
And, also, there is no notion of “our branch” until one has traveled along it. At any given point in time, there are many branches ahead. Only looking back one can speak about one’s branch. But looking forward one can’t predict the branch one will end up in. One does not know the results of future “observations”/”measurements”. This is not what a classical universe looks like.
(Speaking of MWI, I recall David Deutsch’s “Fabric of Reality” very eloquently explaining effects from “neighboring branches”. The reason I am referencing this book is that this was the work particularly strongly associated with MWI back then. So I think we should be able to rely on his understanding of MWI.)
yes one can—all of them!
Yes, but then what do you want to prove?
Something like, “for all branches, [...]”? That might be not that easy to prove or even to formulate. In any case, the linked proof has not even started to deal with this.
Something like, “there exist a branch such that [...]”? That might be quite tractable, but probably not enough for practical purposes.
“The probability that one ends up in a branch with such and such properties is no less than/no more than” [...]? Probably something like that, realistically speaking, but this still needs a lot of work, conceptual and mathematical...
bringing QM into this is not helping. All these types of questions are completely generic QM questions and ultimately they come down to measure ||Psi>|²
It’s just… having a proof is supposed to boost our confidence that the conclusion is correct...
if the proof relies on assumptions which are already quite far from the majority opinion about our actual reality (and are probably going to deviate further, as AIs will be better physicists and engineers than us and will leverage the strangeness of our physics much further than we do), then what’s the point of that “proof”?
how does having this kind of “proof” increase our confidence in what seems informally correct for a single branch reality (and rather uncertain in a presumed multiverse, but we don’t even know if we are in a multiverse, so bringing a multiverse in might, indeed, be one of the possible objections to the statement, but I don’t know if one wants to pursue this line of discourse, because it is much more complicated than what we are doing here so far)?
(as an intellectual exercise, a proof like that is still of interest, even under the unrealistic assumption that we live in a computable reality, I would not argue with that; it’s still interesting)
yes, but thanks to decoherence this generally doesn’t affect macroscopic variables. Branches are causally independent once they have split.
No. I can only repeat my reference to Fabric of Reality as a good presentation of MWI and to remind that we do not live in a classical world, which is easy to confirm empirically.
And there are plenty of known macroscopic quantum effects already, and that list will only grow. Lasers are quantum, superfluidity and superconductivity are quantum, and so on.
Decoherence means that different branches don’t interfere with each other on macroscopic scales. That’s just the way it works.
Superfluids/superconductors/lasers are still microscopic effects that only matter at the scale of atoms or at ultra-low temperature or both.
No, not microscopic.
Coherent light produced by lasers is not microscopic, we see its traces in the air. And we see the consequences (old fashioned holography and the ability to cut things with focused light, even at large distances). Room temperature is fine for that.
Superconductors used in the industry are not microscopic (and the temperatures are high enough to enable industrial use of them in rather common devices such as MRI scanners).
And I personally think that superintelligence leading to good trajectories is possible. It seems unlikely that we are in a reality where there is a theorem to the contrary.
It feels intuitively likely that it is possible to have superintelligence or the ecosystem of superintelligences which is wise enough to be able to navigate well.
But I doubt that one is likely to be able to formally prove that.
E.g. it is possible that we are in a reality where very cautious and reasonable, but sufficiently advanced experiments in quantum gravity lead to a disaster.
Advanced systems are likely to reach those capabilities, and they might make very reasonable estimates that it’s OK to proceed, but due to bad luck of being in a particularly unfortunate reality, the “local neighborhood” might get destroyed as a result… One can’t prove that it’s not the case...
Whereas, if the level of overall intelligence remains sufficiently low, we might not be able to ever achieve the technical capabilities to get into the danger zone...
It is logically possible that the reality is like that.
Yes, it is. But even if that is the case, by the argument given in this post, there must exist an AI system that avoids the dangerzone.
Yes, possibly.
Not by the argument given in the post (considering quantum gravity, one immediately sees how inadequate and unrealistic is the model in the post).
But yes, it is possible that they will be so wise that they will be cautious enough even in a very unfortunate situation.
Yes, I was trying to explicitly refute your claim, but my refutation has holes.
(I don’t think you have a valid proof, but this is not yet a counterexample.)