I do agree that trying to hack the password is a smarter method for the AI to try. I was simply showing an example of a task that an AI would want to do, but be unable to due to computational intractability.
I chose the example of Yudkowsky’s plan for my analysis because he has described it as his “lower bound” plan. After spending two decades on AI safety, talking to all the most brilliant minds in the field, this is apparently what he thinks the most convincing plan for AI takeover is. If I believe this plan is intractable (and I very much believe it is), then it opens up the possibility that all such plans are intractable. And if you do find a tractable plan, then making the plan intractable would an invaluable AI safety cause area.
Proving that something is computationally intractable under a certain restricted model only means that the AI must find a way to step outside of your model, or do something else you didn’t think of.
Imagine if I made the claim that a freshly started AGI in a box could kill everyone on earth in under an minute. I propose that it creates some sort of gamma ray burst that hits everyone on earth simultaneously. You come back to me with a detailed proof that that plan is bonkers and wouldn’t work. I then respond “sure, that wouldn’t work, but the AI is way smarter than me, so it would figure something else out”.
My point is that, factually, some tasks are impossible. My belief is that a computationally tractable plan for guaranteeing success at x-risk does not currently exist, although I think a plan with like a 0.01% chance of success might. If you think otherwise, you have to actually prove it, not just assume it.
Well, “opens up the possibility that all such plans are intractable” is a much weaker claim than “impossible”, and I disagree about the concrete difficulty of at least one of the step in your plan: there are known toxins with ~100% lethality to humans in nature.
Distributing this toxin via a virus engineered using known techniques from GoF research and some nanotechnology for a timer seems pretty tractable, and close enough to 100% lethal to me.
The tech to build a timer circuit out RNA and ATP instead of in silicon and electricity doesn’t currently exist yet AFAIK, but the complexity, size, and energy constraints that such a timer design must meet are certainly tractable to design at nanoscale in silicon. Moving to a biological substrate might be hard, but knowing a bit about what hardware engineers are capable of doing with silicon, often with extremely limited energy budgets, it certainly doesn’t seem intractable for human designers, let alone for an ASI, to do similar things with biology.
So I’m a bit skeptical of your estimate of the other steps as “probably incomputable”!
Also, a more general point: you’ve used “incomputable” throughout, in what appears to be an informal way of saying “computationally intractable”.
In computational complexity theory, “uncomputable”, “undecidable”, “NP-complete”, and Big-O notation have very precise technical meanings: they are statements about the limiting behavior of particular classes of problems. They don’t necessarily imply anything about particular concrete instances of such problems.
So it’s not just that there are good approximations for solving the traveling salesman problem in general or probabilistically, which you correctly note.
It’s that, for any particular instance of the traveling salesman problem (or any other NP-hard problem), approximating or solving that particular instance may be tractable or even trivial, for example, by applying a specialized algorithm, or because the particular instance of the problem you need to solve has exploitable regularities or is otherwise degenerate in some way.
The same is true of e.g. the halting problem, which is provably undecidable in general! And yet, many programs that we care about can be proved to halt, or proved not to halt, in very reasonable amounts of time, often trivially by running them, or by inspection of their source. In fact, for a given randomly chosen program (under certain sampling assumptions), it is overwhelmingly likely that whether it halts or not is decidable. See the reference in this footnote for more.
The point of all of this is that I think saying something is “probably incomputable” is just too imprecise and informal to be useful as a bound the capabilities of a superintelligence (or even on human designers, for that matter), and trying to make the argument more precise probably causes it to break down, or requires a formulation of the problem in a domain where results from computational complexity theory are simply not applicable.
I do agree that trying to hack the password is a smarter method for the AI to try. I was simply showing an example of a task that an AI would want to do, but be unable to due to computational intractability.
I chose the example of Yudkowsky’s plan for my analysis because he has described it as his “lower bound” plan. After spending two decades on AI safety, talking to all the most brilliant minds in the field, this is apparently what he thinks the most convincing plan for AI takeover is. If I believe this plan is intractable (and I very much believe it is), then it opens up the possibility that all such plans are intractable. And if you do find a tractable plan, then making the plan intractable would an invaluable AI safety cause area.
Imagine if I made the claim that a freshly started AGI in a box could kill everyone on earth in under an minute. I propose that it creates some sort of gamma ray burst that hits everyone on earth simultaneously. You come back to me with a detailed proof that that plan is bonkers and wouldn’t work. I then respond “sure, that wouldn’t work, but the AI is way smarter than me, so it would figure something else out”.
My point is that, factually, some tasks are impossible. My belief is that a computationally tractable plan for guaranteeing success at x-risk does not currently exist, although I think a plan with like a 0.01% chance of success might. If you think otherwise, you have to actually prove it, not just assume it.
Well, “opens up the possibility that all such plans are intractable” is a much weaker claim than “impossible”, and I disagree about the concrete difficulty of at least one of the step in your plan: there are known toxins with ~100% lethality to humans in nature.
Distributing this toxin via a virus engineered using known techniques from GoF research and some nanotechnology for a timer seems pretty tractable, and close enough to 100% lethal to me.
The tech to build a timer circuit out RNA and ATP instead of in silicon and electricity doesn’t currently exist yet AFAIK, but the complexity, size, and energy constraints that such a timer design must meet are certainly tractable to design at nanoscale in silicon. Moving to a biological substrate might be hard, but knowing a bit about what hardware engineers are capable of doing with silicon, often with extremely limited energy budgets, it certainly doesn’t seem intractable for human designers, let alone for an ASI, to do similar things with biology.
So I’m a bit skeptical of your estimate of the other steps as “probably incomputable”!
Also, a more general point: you’ve used “incomputable” throughout, in what appears to be an informal way of saying “computationally intractable”.
In computational complexity theory, “uncomputable”, “undecidable”, “NP-complete”, and Big-O notation have very precise technical meanings: they are statements about the limiting behavior of particular classes of problems. They don’t necessarily imply anything about particular concrete instances of such problems.
So it’s not just that there are good approximations for solving the traveling salesman problem in general or probabilistically, which you correctly note.
It’s that, for any particular instance of the traveling salesman problem (or any other NP-hard problem), approximating or solving that particular instance may be tractable or even trivial, for example, by applying a specialized algorithm, or because the particular instance of the problem you need to solve has exploitable regularities or is otherwise degenerate in some way.
The same is true of e.g. the halting problem, which is provably undecidable in general! And yet, many programs that we care about can be proved to halt, or proved not to halt, in very reasonable amounts of time, often trivially by running them, or by inspection of their source. In fact, for a given randomly chosen program (under certain sampling assumptions), it is overwhelmingly likely that whether it halts or not is decidable. See the reference in this footnote for more.
The point of all of this is that I think saying something is “probably incomputable” is just too imprecise and informal to be useful as a bound the capabilities of a superintelligence (or even on human designers, for that matter), and trying to make the argument more precise probably causes it to break down, or requires a formulation of the problem in a domain where results from computational complexity theory are simply not applicable.