I have strong-upvoted this post because I think that a discussion about the possibility of alignment is necessary. However, I don’t think an impossibility proof would change very much about our current situation.
To stick with the nuclear bomb analogy, we already KNOW that the first uncontrolled nuclear chain reaction will definitely ignite the atmosphere and destroy all life on earth UNLESS we find a mechanism to somehow contain that reaction (solve alignment/controllability). As long as we don’t know how to build that mechanism, we must not start an uncontrollable chain reaction. Yet we just throw more and more enriched uranium into a bucket and see what happens.
Our problem is not that we don’t know whether solving alignment is possible. As long as we haven’t solved it, this is largely irrelevant in my view (you could argue that we should stop spending time and resources at trying to solve it, but I’d argue that even if it were impossible, trying to solve alignment can teach us a lot about the dangers associated with misalignment). Our problem is that so many people don’t realize (or admit) that there is even a possibility of an advanced AI becoming uncontrollable and destroying our future anytime soon.
Lots of people when confronted with various reasons why AGI would be dangerous object that it’s all speculative, or just some sci-fi scenarios concocted by people with overactive imaginations. I think a rigorous, peer reviewed, authoritative proof would strengthen the position against these sort of objections.
I agree that a proof would be helpful, but probably not as impactful as one might hope. A proof of impossibility would have to rely on certain assumptions, like “superintelligence” or whatever, that could also be doubted or called sci-fi.
Now that you mention it, it does seem a bit odd that there hasn’t even been one rigorous, logically correct, and fully elaborated (i.e. all axioms enumerated) paper on this topic.
Or even a long post, there’s always something stopping it short of the ideal. Some logic error, some glossed over assumption, etc...
There’s a few papers on AI risks, I think they were pretty solid? But the problem is that however one does it, it remains in the realm of conceptual, qualitative discussion if we can’t first agree on formal definitions of AGI or alignment that someone can then Do Math on.
...qualitative discussion if we can’t first agree on formal definitions of AGI...
Yes, that’s part of what I meant by enumerating all axioms. Papers just assume every potential reader understands the same definition for ‘AGI’, ‘AI’, etc...
When clearly that is not the case. Since there isn’t an agreed on formal definition in the first place, that seems like the problem to tackle before anything downstream.
Well, that’s mainly a problem with not even having a clear definition of intelligence as a whole. We might have better luck with more focused definitions like a “recursive agent” (by which I mean, an agent whose world model is general enough to include itself).
Like dr_s stated, I’m contending that proof would be qualitatively different from “very hard” and powerful ammunition for advocating a pause...
Senator X: “Mr. CEO, your company continues to push the envelope and yet we now have proof that neither you nor anyone else will ever be able to guarantee that humans remain in control. You talk about safety and call for regulation but we seem to now have the answer. Human control will ultimately end. I repeat my question: Are you consciously working to replace humanity? Do you have children, sir?”
AI expert to Xi Jinping: “General Secretary, what this means is that we will not control it. It will control us. In the end, Party leadership will cede to artificial agents. They may or may not adhere to communist principals. They may or may not believe in the primacy of China. Population advantage will become nothing because artificial minds can be copied 10 billion times. Our own unification of mind, purpose, and action will pale in comparison. Our chief advantages of unity and population will no longer exist.”
AI expert to US General: “General, think of this as building an extremely effective infantry soldier who will become CJCS then POTUS in a matter of weeks or months.”
Like I wrote in my reply to dr_s, I think a proof would be helpful, but probably not a game changer.
Mr. CEO: “Senator X, the assumptions in that proof you mention are not applicable in our case, so it is not relevant for us. Of course we make sure that assumption Y is not given when we build our AGI, and assumption Z is pure science-fiction.”
What the AI expert says to Xi Jinping and to the US general in your example doesn’t rely on an impossibility proof in my view.
I have strong-upvoted this post because I think that a discussion about the possibility of alignment is necessary. However, I don’t think an impossibility proof would change very much about our current situation.
To stick with the nuclear bomb analogy, we already KNOW that the first uncontrolled nuclear chain reaction will definitely ignite the atmosphere and destroy all life on earth UNLESS we find a mechanism to somehow contain that reaction (solve alignment/controllability). As long as we don’t know how to build that mechanism, we must not start an uncontrollable chain reaction. Yet we just throw more and more enriched uranium into a bucket and see what happens.
Our problem is not that we don’t know whether solving alignment is possible. As long as we haven’t solved it, this is largely irrelevant in my view (you could argue that we should stop spending time and resources at trying to solve it, but I’d argue that even if it were impossible, trying to solve alignment can teach us a lot about the dangers associated with misalignment). Our problem is that so many people don’t realize (or admit) that there is even a possibility of an advanced AI becoming uncontrollable and destroying our future anytime soon.
Lots of people when confronted with various reasons why AGI would be dangerous object that it’s all speculative, or just some sci-fi scenarios concocted by people with overactive imaginations. I think a rigorous, peer reviewed, authoritative proof would strengthen the position against these sort of objections.
I agree that a proof would be helpful, but probably not as impactful as one might hope. A proof of impossibility would have to rely on certain assumptions, like “superintelligence” or whatever, that could also be doubted or called sci-fi.
No actually, assuming the machinery has a hard substrate and is self-maintaining is enough.
Now that you mention it, it does seem a bit odd that there hasn’t even been one rigorous, logically correct, and fully elaborated (i.e. all axioms enumerated) paper on this topic.
Or even a long post, there’s always something stopping it short of the ideal. Some logic error, some glossed over assumption, etc...
There’s a few papers on AI risks, I think they were pretty solid? But the problem is that however one does it, it remains in the realm of conceptual, qualitative discussion if we can’t first agree on formal definitions of AGI or alignment that someone can then Do Math on.
Yes, that’s part of what I meant by enumerating all axioms. Papers just assume every potential reader understands the same definition for ‘AGI’, ‘AI’, etc...
When clearly that is not the case. Since there isn’t an agreed on formal definition in the first place, that seems like the problem to tackle before anything downstream.
Well, that’s mainly a problem with not even having a clear definition of intelligence as a whole. We might have better luck with more focused definitions like a “recursive agent” (by which I mean, an agent whose world model is general enough to include itself).
Like dr_s stated, I’m contending that proof would be qualitatively different from “very hard” and powerful ammunition for advocating a pause...
Senator X: “Mr. CEO, your company continues to push the envelope and yet we now have proof that neither you nor anyone else will ever be able to guarantee that humans remain in control. You talk about safety and call for regulation but we seem to now have the answer. Human control will ultimately end. I repeat my question: Are you consciously working to replace humanity? Do you have children, sir?”
AI expert to Xi Jinping: “General Secretary, what this means is that we will not control it. It will control us. In the end, Party leadership will cede to artificial agents. They may or may not adhere to communist principals. They may or may not believe in the primacy of China. Population advantage will become nothing because artificial minds can be copied 10 billion times. Our own unification of mind, purpose, and action will pale in comparison. Our chief advantages of unity and population will no longer exist.”
AI expert to US General: “General, think of this as building an extremely effective infantry soldier who will become CJCS then POTUS in a matter of weeks or months.”
Like I wrote in my reply to dr_s, I think a proof would be helpful, but probably not a game changer.
Mr. CEO: “Senator X, the assumptions in that proof you mention are not applicable in our case, so it is not relevant for us. Of course we make sure that assumption Y is not given when we build our AGI, and assumption Z is pure science-fiction.”
What the AI expert says to Xi Jinping and to the US general in your example doesn’t rely on an impossibility proof in my view.
Yes. Valid. How to avoid reducing to a toy problem or such narrowing assumptions (in order to achieve a proof) that allows Mr. CEO to dismiss it.
When I revise, I’m going to work backwards with CEO/Senator dialog in mind.