What Eliezer seemed to be objecting to was someone proposing a successfully boxed AI as an example of why “able to destroy humanity” can’t be a part of the definition of “AI” (or more charitably, “artificial superintelligence”). For boxed AI to be such an example (as opposed to a good idea to actually strive toward), it only has to be not knowably impossible.
I see your point there. But I think this discussion sort of went in an irrelevant direction, albeit probably my fault for not being clear enough. When I put “powerful enough to destroy humanity” in that criterion, I mainly meant “powerful” as in “really powerful optimization process”, mathematical optimization power, not “power” as in direct influence over the world. We’re inferring that the former will usually lead fairly easily to the latter, but they are not identical. So “powerful enough to destroy humanity” would mean something like “powerful enough to figure out a good subjunctive plan to do so given enough information about the world, even if it has no output streams and is kept in an airtight safe at the bottom of the ocean”.
Reading back further into the context I see your point. Imagining such an AI is sufficient and Eliezer does seem to be confusing a priori with obvious. I expect that he just completed a pattern based off “AI box” and so didn’t really understand the point that was being made—he should have replied with a “Yes—But”. (I, of course, made a similar mistake in as much as I wasn’t immediately prompted to click back up the tree beyond Eliezer’s comment.)
What Eliezer seemed to be objecting to was someone proposing a successfully boxed AI as an example of why “able to destroy humanity” can’t be a part of the definition of “AI” (or more charitably, “artificial superintelligence”). For boxed AI to be such an example (as opposed to a good idea to actually strive toward), it only has to be not knowably impossible.
I see your point there. But I think this discussion sort of went in an irrelevant direction, albeit probably my fault for not being clear enough. When I put “powerful enough to destroy humanity” in that criterion, I mainly meant “powerful” as in “really powerful optimization process”, mathematical optimization power, not “power” as in direct influence over the world. We’re inferring that the former will usually lead fairly easily to the latter, but they are not identical. So “powerful enough to destroy humanity” would mean something like “powerful enough to figure out a good subjunctive plan to do so given enough information about the world, even if it has no output streams and is kept in an airtight safe at the bottom of the ocean”.
Reading back further into the context I see your point. Imagining such an AI is sufficient and Eliezer does seem to be confusing a priori with obvious. I expect that he just completed a pattern based off “AI box” and so didn’t really understand the point that was being made—he should have replied with a “Yes—But”. (I, of course, made a similar mistake in as much as I wasn’t immediately prompted to click back up the tree beyond Eliezer’s comment.)