There are strong prior reasons to think that it’s better for the public to have better beliefs about AI strategy.
That may be, but note that the word “prior” is doing basically all of the work in this sentence. (To see this, just replace “AI strategy” with practically any other subject, and notice how the modified statement sounds just as sensible as the original.) This is important because priors can easily be overwhelmed by additional evidence—and insofar as AI researcher Alice thinks a specific discussion topic in AI strategy has the potential to be dangerous, it’s worth realizing Alice probably has some specific inside view reasons to believe that’s the case. And, if those inside view arguments happen to require an understanding of the topic that Alice believes to be dangerous, then Alice’s hands are now tied: she’s both unable to share information about something, and unable to explain why she can’t share that information.
Naturally, this doesn’t just make Alice’s life more difficult: if you’re someone on the outside looking in, then you have no way of confirming if anything Alice says is true, and you’re forced to resort to just trusting Alice. If you don’t have a whole lot of trust in Alice to begin with, you might assume the worst of her: Alice is either rationalizing or lying (or possibly both) in order to gain status for herself and the field she works in.
I think, however, that these are dangerous assumptions to make. Firstly, if Alice is being honest and rational, then this policy effectively punishes her for being “in the know”—she must either divulge information she (correctly) believes to be dangerous, or else suffer an undeserved reputational hit. I’m particularly wary of imposing incentive structures of this kind around AI safety research, especially considering the relatively small number of people working on AI safety to begin with.
Secondly, however: in addition to being unfair to Alice, there are more subtle effects that such a policy may have. In particular, if Alice feels pressured to disclose the reasons she can’t disclose things, that may end up influencing the rate and/or quality of the research she does in the first place (Ctrl+F “walls”). This could have serious consequences down the line for AI safety research, above and beyond the object-level hazards of revealing potentially dangerous ideas to the public.
Given all of this, I don’t think it’s obvious that the best move at this point involves making all of the strategic arguments around AI safety public. (And note that I say this as a member of said public: I am not affiliated with MIRI or any other AI safety institution, nor am I personally acquainted with anyone who is so affiliated. This therefore makes me a direct counter-example to your claim about the public in general having reason to think secret-keeping organizations must be doing so for self-interested reasons.)
To be clear: I think there is a possible world in which your arguments make sense. I also think there is a possible world in which your arguments not only do not make sense, but would lead to a clearly worse outcome if taken seriously. It’s not clear to me which of these worlds we actually live in, and I don’t think you’ve done a sufficient job of arguing that we live in the former world instead of the latter.
If someone’s claiming “topic X is dangerous to talk about, and I’m not even going to try to convince you of the abstract decision theory implying this, because this decision theory is dangerous to talk about”, I’m not going to believe them, because that’s frankly absurd.
It’s possible to make abstract arguments that don’t reveal particular technical details, such as by referring to historical cases, or talking about hypothetical situations.
It’s also possible for Alice to convince Bob that some info is dangerous by giving the info to Carol, who is trusted by both Alice and Bob, after which Carol tells Bob how dangerous the info is.
If Alice isn’t willing to do any of these things, fine, there’s a possible but highly unlikely world where she’s right, and she takes a reputation hit due to the “unlikely” part of that sentence.
(Note, the alternative hypothesis isn’t just direct selfishness; what’s more likely is cliquish inner ring dynamics)
That may be, but note that the word “prior” is doing basically all of the work in this sentence. (To see this, just replace “AI strategy” with practically any other subject, and notice how the modified statement sounds just as sensible as the original.) This is important because priors can easily be overwhelmed by additional evidence—and insofar as AI researcher Alice thinks a specific discussion topic in AI strategy has the potential to be dangerous, it’s worth realizing Alice probably has some specific inside view reasons to believe that’s the case. And, if those inside view arguments happen to require an understanding of the topic that Alice believes to be dangerous, then Alice’s hands are now tied: she’s both unable to share information about something, and unable to explain why she can’t share that information.
Naturally, this doesn’t just make Alice’s life more difficult: if you’re someone on the outside looking in, then you have no way of confirming if anything Alice says is true, and you’re forced to resort to just trusting Alice. If you don’t have a whole lot of trust in Alice to begin with, you might assume the worst of her: Alice is either rationalizing or lying (or possibly both) in order to gain status for herself and the field she works in.
I think, however, that these are dangerous assumptions to make. Firstly, if Alice is being honest and rational, then this policy effectively punishes her for being “in the know”—she must either divulge information she (correctly) believes to be dangerous, or else suffer an undeserved reputational hit. I’m particularly wary of imposing incentive structures of this kind around AI safety research, especially considering the relatively small number of people working on AI safety to begin with.
Secondly, however: in addition to being unfair to Alice, there are more subtle effects that such a policy may have. In particular, if Alice feels pressured to disclose the reasons she can’t disclose things, that may end up influencing the rate and/or quality of the research she does in the first place (Ctrl+F “walls”). This could have serious consequences down the line for AI safety research, above and beyond the object-level hazards of revealing potentially dangerous ideas to the public.
Given all of this, I don’t think it’s obvious that the best move at this point involves making all of the strategic arguments around AI safety public. (And note that I say this as a member of said public: I am not affiliated with MIRI or any other AI safety institution, nor am I personally acquainted with anyone who is so affiliated. This therefore makes me a direct counter-example to your claim about the public in general having reason to think secret-keeping organizations must be doing so for self-interested reasons.)
To be clear: I think there is a possible world in which your arguments make sense. I also think there is a possible world in which your arguments not only do not make sense, but would lead to a clearly worse outcome if taken seriously. It’s not clear to me which of these worlds we actually live in, and I don’t think you’ve done a sufficient job of arguing that we live in the former world instead of the latter.
If someone’s claiming “topic X is dangerous to talk about, and I’m not even going to try to convince you of the abstract decision theory implying this, because this decision theory is dangerous to talk about”, I’m not going to believe them, because that’s frankly absurd.
It’s possible to make abstract arguments that don’t reveal particular technical details, such as by referring to historical cases, or talking about hypothetical situations.
It’s also possible for Alice to convince Bob that some info is dangerous by giving the info to Carol, who is trusted by both Alice and Bob, after which Carol tells Bob how dangerous the info is.
If Alice isn’t willing to do any of these things, fine, there’s a possible but highly unlikely world where she’s right, and she takes a reputation hit due to the “unlikely” part of that sentence.
(Note, the alternative hypothesis isn’t just direct selfishness; what’s more likely is cliquish inner ring dynamics)