I’m very much an outsider to this discussion, and by no means a “professional researcher”, but I believe those to be the primary reasons why I’m actually qualified to make the following point. I’m sure it’s been made before, but a rapid scan revealed no specific statement of this argument quite as directly and explicitly.
HoldenKarnofsky:
(...) my view is that SI’s suggested approach to AGI development is more dangerous than the “traditional” approach to software development, and thus that SI is advocating for an approach that would worsen risks from AGI.
I’ve always understood SI’s position on this matter not as one of “We should not focus on building Tool AI! Fully reflectively self-modifying AGIs are the only way to go!”, but rather that it is extremely unlikely that we can prevent everyone else from building one.
To my understanding, logic goes: If any programmer with relevant skills is sufficiently convinced, by whatever means and for whatever causes, that building a full traditional AGI is more efficient and will more “lazily” achieve his goals with less resources or achieve them faster, the programmer will build it whether you think it’s a good idea or not. As such, SI’s “Moral Imperative” is to account for this scenario as there is non-negligible probability of it actually happening, for if they do not, they effectively become hypocritical in claiming to work towards reducing existential AI risk.
To reiterate with silly scare-formatting: It is completely irrelevant, in practice, what SI “advocates” or “promotes” as a preferred approach to building safe AI, because the probability that someone, somewhere, some day is going to use the worst possible approach is definitely non-negligible. If there is not already a sufficiently advanced Friendly AI in place to counter such a threat, we are then effectively defenseless.
To metaphorize, this is a case of: “It doesn’t matter if you think only using remote-controlled battle robots would be a better way to resolve international disputes. At some point, someone somewhere is going to be convinced that killing all of you is going to be faster and cheaper and more certain of achieving their goals, so they’ll build one giant bomb and throw it at you without first making sure they won’t kill themselves in the process.”
This looks similar to this point Kaj Sotala made. My own restatement: As the body of narrow AI research devoted to making tools grows larger and larger, building agent AGI gets easier and easier, and there will always be a few Shane Legg types who are crazy enough to try it.
I sometimes suspect that Holden’s true rejection to endorsing SI is that the optimal philanthropy movement is fringe enough already, and he doesn’t want to associate it with nutty-seeming beliefs related to near-inevitable doom from superintelligence. Sometimes I wish SI would market themselves as being similar to nuclear risk organizations like the Bulletin of Atomic Scientists. After all, EY was an AI researcher who quit and started working on Friendliness when he saw the risks, right? I think you could make a pretty good case for SI’s usefulness just working based on analogies from nuclear risk, without any mention of FOOM or astronomical waste or paperclip maximizers.
Ideally we’d have wanted to know about nuclear weapon risks before having built them, not afterwards, right?
Personally, I highly doubt that to be Holden’s true rejection, though it is most likely one of the emotional considerations that cannot be ignored in a strategic perspective. Holden claims to have gone through most of the relevant LessWrong sequence and SIAI public presentation material, which makes the likelihood of a deceptive (or self-deceptive) argumentation lower, I believe.
No, what I believe to be the real issue is that Holden and (Most of SIAI) have disagreements over many specific claims used to justify broader claims—if the specific claims are granted in principle, both seem to generally agree in good bayesian fashion on the broader or more general claim. Much of the disagreements on those specifics also appears to stem from different priors in ethical and moral values, as well as differences in their evaluations and models of human population behaviors and specific (but often unspecified) “best guess” probabilities.
For a generalized example, one strong claim for existential risk being optimal effort is that even a minimal decrease in risk provides immense expected value simply from the sheer magnitude of what could most likely be achieved by humanity throughout the rest of its course of existence. Many experts and scientists outright reject this on the grounds that “future, intangible, merely hypothetical other humans” should not be assigned value on the same order-of-magnitude as current humans, or even one order of magnitude lower.
but rather that it is extremely unlikely that we can prevent everyone else from building one.
Well, SI’s mission makes sense on the premise that the best way to prevent a badly built AGI from being developed or deployed is to build a friendly AGI which has that as one of its goals. ‘Best way’ here is a compromise between, on the one hand, the effectiveness of the FAI relative to other approaches, and on the other, the danger presented by the FAI itself as opposed to other approaches.
So I think Holden’s position is that the ratio of danger vs. effectiveness does not weigh favorably for FAI as opposed to tool AI. So to argue against Holden, we would have to argue either that FAI will be less dangerous than he thinks, or that tool AI will be less effective than he thinks.
Indeed, we would have to argue that to argue against Holden.
My initial reaction was to counter this with a claim that we should not be arguing against anyone in the first place, but rather looking for probable truth (concentrate anticipations). And then I realized how stupid that was: Arguments Are Soldiers. If SI (and by the Blue vs Green principle, any SI-supporter) can’t even defend a few claims and defeat its opponents, it is obviously stupid and not worth paying attention to.
SI needs some amount of support, yet support-maximization strategies carry a very high risk of introducing highly dangerous intellectual contamination through various forms (including self-reinforcing biases in the minds of researchers and future supporters) that could turn out to cause even more existential risk. Yet, at the same time, not gathering enough support quickly enough dramatically augments the risk that someone, somewhere, is going to trip on a power cable and poof, all humans are just gone.
I am definitely not masterful enough in mathematics and bayescraft to calculate the optimal route through this differential probabilistic maze, but I suspect others could provide a very good estimate.
Also, it’s very much worth noting that these very considerations, on a meta level, are an integral part of SI’s mission, so figuring out whether that premise you stated is true or not, and whether there are better solutions or not actually is SI’s objective. Basically, while I might understand some of the cognitive causes for it, I am still very much rationally confused when someone questions SI’s usefulness by questioning the efficiency of subgoal X, while SI’s original and (to my understanding) primary mission is precisely to calculate the efficiency of subgoal X.
I’m very much an outsider to this discussion, and by no means a “professional researcher”, but I believe those to be the primary reasons why I’m actually qualified to make the following point. I’m sure it’s been made before, but a rapid scan revealed no specific statement of this argument quite as directly and explicitly.
I’ve always understood SI’s position on this matter not as one of “We should not focus on building Tool AI! Fully reflectively self-modifying AGIs are the only way to go!”, but rather that it is extremely unlikely that we can prevent everyone else from building one.
To my understanding, logic goes: If any programmer with relevant skills is sufficiently convinced, by whatever means and for whatever causes, that building a full traditional AGI is more efficient and will more “lazily” achieve his goals with less resources or achieve them faster, the programmer will build it whether you think it’s a good idea or not. As such, SI’s “Moral Imperative” is to account for this scenario as there is non-negligible probability of it actually happening, for if they do not, they effectively become hypocritical in claiming to work towards reducing existential AI risk.
To reiterate with silly scare-formatting: It is completely irrelevant, in practice, what SI “advocates” or “promotes” as a preferred approach to building safe AI, because the probability that someone, somewhere, some day is going to use the worst possible approach is definitely non-negligible. If there is not already a sufficiently advanced Friendly AI in place to counter such a threat, we are then effectively defenseless.
To metaphorize, this is a case of: “It doesn’t matter if you think only using remote-controlled battle robots would be a better way to resolve international disputes. At some point, someone somewhere is going to be convinced that killing all of you is going to be faster and cheaper and more certain of achieving their goals, so they’ll build one giant bomb and throw it at you without first making sure they won’t kill themselves in the process.”
This looks similar to this point Kaj Sotala made. My own restatement: As the body of narrow AI research devoted to making tools grows larger and larger, building agent AGI gets easier and easier, and there will always be a few Shane Legg types who are crazy enough to try it.
I sometimes suspect that Holden’s true rejection to endorsing SI is that the optimal philanthropy movement is fringe enough already, and he doesn’t want to associate it with nutty-seeming beliefs related to near-inevitable doom from superintelligence. Sometimes I wish SI would market themselves as being similar to nuclear risk organizations like the Bulletin of Atomic Scientists. After all, EY was an AI researcher who quit and started working on Friendliness when he saw the risks, right? I think you could make a pretty good case for SI’s usefulness just working based on analogies from nuclear risk, without any mention of FOOM or astronomical waste or paperclip maximizers.
Ideally we’d have wanted to know about nuclear weapon risks before having built them, not afterwards, right?
Personally, I highly doubt that to be Holden’s true rejection, though it is most likely one of the emotional considerations that cannot be ignored in a strategic perspective. Holden claims to have gone through most of the relevant LessWrong sequence and SIAI public presentation material, which makes the likelihood of a deceptive (or self-deceptive) argumentation lower, I believe.
No, what I believe to be the real issue is that Holden and (Most of SIAI) have disagreements over many specific claims used to justify broader claims—if the specific claims are granted in principle, both seem to generally agree in good bayesian fashion on the broader or more general claim. Much of the disagreements on those specifics also appears to stem from different priors in ethical and moral values, as well as differences in their evaluations and models of human population behaviors and specific (but often unspecified) “best guess” probabilities.
For a generalized example, one strong claim for existential risk being optimal effort is that even a minimal decrease in risk provides immense expected value simply from the sheer magnitude of what could most likely be achieved by humanity throughout the rest of its course of existence. Many experts and scientists outright reject this on the grounds that “future, intangible, merely hypothetical other humans” should not be assigned value on the same order-of-magnitude as current humans, or even one order of magnitude lower.
Well, SI’s mission makes sense on the premise that the best way to prevent a badly built AGI from being developed or deployed is to build a friendly AGI which has that as one of its goals. ‘Best way’ here is a compromise between, on the one hand, the effectiveness of the FAI relative to other approaches, and on the other, the danger presented by the FAI itself as opposed to other approaches.
So I think Holden’s position is that the ratio of danger vs. effectiveness does not weigh favorably for FAI as opposed to tool AI. So to argue against Holden, we would have to argue either that FAI will be less dangerous than he thinks, or that tool AI will be less effective than he thinks.
I take it the latter is the more plausible.
Indeed, we would have to argue that to argue against Holden.
My initial reaction was to counter this with a claim that we should not be arguing against anyone in the first place, but rather looking for probable truth (concentrate anticipations). And then I realized how stupid that was: Arguments Are Soldiers. If SI (and by the Blue vs Green principle, any SI-supporter) can’t even defend a few claims and defeat its opponents, it is obviously stupid and not worth paying attention to.
SI needs some amount of support, yet support-maximization strategies carry a very high risk of introducing highly dangerous intellectual contamination through various forms (including self-reinforcing biases in the minds of researchers and future supporters) that could turn out to cause even more existential risk. Yet, at the same time, not gathering enough support quickly enough dramatically augments the risk that someone, somewhere, is going to trip on a power cable and poof, all humans are just gone.
I am definitely not masterful enough in mathematics and bayescraft to calculate the optimal route through this differential probabilistic maze, but I suspect others could provide a very good estimate.
Also, it’s very much worth noting that these very considerations, on a meta level, are an integral part of SI’s mission, so figuring out whether that premise you stated is true or not, and whether there are better solutions or not actually is SI’s objective. Basically, while I might understand some of the cognitive causes for it, I am still very much rationally confused when someone questions SI’s usefulness by questioning the efficiency of subgoal X, while SI’s original and (to my understanding) primary mission is precisely to calculate the efficiency of subgoal X.