[Note: this, and all comments on this post unless specified otherwise, is written with my ‘LW user’ hat on, not my ‘LW Admin’ or ‘MIRI employee’ hat on, and thus is my personal view instead of the LW view or the MIRI view.]
As someone who thinks about AGI timelines a lot, I find myself dissatisfied with this post because it’s unclear what “The AI Timelines Scam” you’re talking about, and I’m worried if I poke at the bits it’ll feel like a motte and bailey, where it seems quite reasonable to me that ’73% of tech executives thinking that the singularity will arrive in <10 years is probably just inflated ‘pro-tech’ reasoning,′ but also it seems quite unreasonable to suggest that strategic considerations about dual use technology should be discussed openly (or should be discussed openly because tech executives have distorted beliefs). It also seems like there’s an argument for weighting urgency in planning that could lead to ‘distorted’ timelines while being a rational response to uncertainty.
On the first point, I think the following might be a fair description of some thinkers in the AGI space, but don’t think this is a fair summary of MIRI (and I think it’s illegible, to me at least, whether you are intending this to be a summary of MIRI):
This bears similarity to some conversations on AI risk I’ve been party to in the past few years. The fear is that Others (DeepMind, China, whoever) will develop AGI soon, so We have to develop AGI first in order to make sure it’s safe, because Others won’t make sure it’s safe and We will. Also, We have to discuss AGI strategy in private (and avoid public discussion), so Others don’t get the wrong ideas. (Generally, these claims have little empirical/rational backing to them; they’re based on scary stories, not historically validated threat models)
I do think it makes sense to write more publicly about the difficulties of writing publicly, but there’s always going to be something odd about it. Suppose I have 5 reasons for wanting discussions to be private, and 3 of them I can easily say. Discussing those three reasons will give people an incomplete picture that might seem complete, in a way that saying “yeah, the sum of factors is against” won’t. Further, without giving specific examples, it’s hard to see which of the ones that are difficult to say you would endorse and which you wouldn’t, and it’s not obvious to me legibility is the best standard here.
But my simple sense is that openly discussing whether or not nuclear weapons were possible (a technical claim on which people might have private information, including intuitions informed by their scientific experience) would have had costs and it was sensible to be secretive about it. If I think that timelines are short because maybe technology X and technology Y fit together neatly, then publicly announcing that increases the chances that we get short timelines because someone plugs together technology X and technology Y. It does seem like marginal scientists speed things up here.
Now, I’m paying a price here; it may be the case that people have tried to glue together technology X and technology Y and it won’t work. I think private discussions on this are way better than no discussions on this, because it increases the chances that those sorts of crucial facts get revealed. It’s not obvious that public discussions are all that much better on these grounds.
On the second point, it feels important to note that the threshold for “take something seriously” is actually quite small. I might think that the chance that I have Lyme disease is 5%, and yet that motivates significant action because of hugely asymmetric cost considerations, or rapid decrease in efficacy of action. I think there’s often a problem where someone ‘has short timelines’ in the sense that they think 10-year scenarios should be planned about at all, but this can be easily mistaken for ‘they think 10-year scenarios are most likely’ because often if you think both an urgent concern and a distant concern are possible, almost all of your effort goes into the urgent concern instead of the distant concern (as sensible critical-path project management would suggest).
But my simple sense is that openly discussing whether or not nuclear weapons were possible (a technical claim on which people might have private information, including intuitions informed by their scientific experience) would have had costs and it was sensible to be secretive about it. If I think that timelines are short because maybe technology X and technology Y fit together neatly, then publicly announcing that increases the chances that we get short timelines because someone plugs together technology X and technology Y. It does seem like marginal scientists speed things up here.
I agree that there are clear costs to making extra arguments of the form “timelines are short because technology X and technology Y will fit together neatly”. However, you could still make public that your timelines are a given probability distribution D, and the reasons which led you to that conclusion are Z% object-level views which you won’t share, and (100-Z)% base rate reasoning and other outside-view considerations, which you will share.
I think there are very few costs to declaring which types of reasoning you’re most persuaded by. There are some costs to actually making the outside-view reasoning publicly available—maybe people who read it will better understand the AI landscape and use that information to do capabilities research.
But having a lack of high-quality public timelines discussion also imposes serious costs, for a few reasons:
1. It means that safety researchers are more likely to be wrong, and therefore end up doing less relevant research. I am generally pretty skeptical of reasoning that hasn’t been written down and undergone public scrutiny.
2. It means there’s a lot of wasted motion across the safety community, as everyone tries to rederive the various arguments involved, and figure out why other people have the views they do, and who they should trust.
3. It makes building common knowledge (and the coordination which that knowledge can be used for) much harder.
4. It harms the credibility of the field of safety from the perspective of outside observers, including other AI researchers.
Also, the more of a risk you think 1 is, the lower the costs of disclosure are, because it becomes more likely that any information gleaned from the disclosure is wrong anyway. Yet predicting the future is incredibly hard! So the base rate for correctness here is low. And I don’t think that safety researchers have a compelling advantage when it comes to correctly modelling how AI will reach human level (compared with thoughtful ML researchers).
Consider, by analogy, a debate two decades ago about whether to make public the ideas of recursive self-improvement and fast takeoff. The potential cost of that is very similar to the costs of disclosure now—giving capabilities researchers these ideas might push them towards building self-improving AIs faster. And yet I think making those arguments public was clearly the right decision. Do you agree that our current situation is fairly analogous?
EDIT: Also, I’m a little confused by
Suppose I have 5 reasons for wanting discussions to be private, and 3 of them I can easily say.
I understand that there are good reasons for discussions to be private, but can you elaborate on why we’d want discussions about privacy to be private?
I mostly agree with your analysis; especially the point about 1 (that the more likely I think my thoughts are to be wrong, the lower cost it is to share them).
I understand that there are good reasons for discussions to be private, but can you elaborate on why we’d want discussions about privacy to be private?
Most examples here have the difficulty that I can’t share them without paying the costs, but here’s one that seems pretty normal:
Suppose someone is a student and wants to be hired later as a policy analyst for governments, and believes that governments care strongly about past affiliations and beliefs. Then it might make sense for them to censor themselves in public under their real name because of potential negative consequences of things they said when young. However, any statement of the form “I specifically want to hide my views on X” made under their real name has similar possible negative consequences, because it’s an explicit admission that the person has something to hide.
Currently, people hiding their unpopular opinions to not face career consequences is fairly standard, and so it’s not that damning to say “I think this norm is sensible” or maybe even “I follow this norm,” but it seems like it would have been particularly awkward to be first person to explicitly argue for that norm.
...if you think both an urgent concern and a distant concern are possible, almost all of your effort goes into the urgent concern instead of the distant concern (as sensible critical-path project management would suggest).
This isn’t obvious to me. And I would be interested in a post laying out the argument, in general or in relation to AI.
Suppose we are also unsure about when we may need the problem solved by. In scenarios where the solution is needed earlier, there is less time for us to collectively work on a solution, so there is less work on the problem than in scenarios where the solution is needed later. Given the diminishing returns on work, that means that a marginal unit of work has a bigger expected value in the case where the solution is needed earlier. This should update us towards working to address the early scenarios more than would be justified by looking purely at their impact and likelihood.
[...]
There are two major factors which seem to push towards preferring more work which focuses on scenarios where AI comes soon. The first is nearsightedness: we simply have a better idea of what will be useful in these scenarios. The second is diminishing marginal returns: the expected effect of an extra year of work on a problem tends to decline when it is being added to a larger total. And because there is a much larger time horizon in which to solve it (and in a wealthier world), the problem of AI safety when AI comes later may receive many times as much work as the problem of AI safety for AI that comes soon. On the other hand one more factor preferring work on scenarios where AI comes later is the ability to pursue more leveraged strategies which eschew object-level work today in favour of generating (hopefully) more object-level work later.
The above is a slightly unrepresentative quote; the paper is largely undecided as to whether shorter term strategies or longer term strategies are more valuable (given uncertainty over timelines), and recommends a portfolio approach (running multiple strategies, that each apply to different timelines). But that’s the sort of argument I think Vaniver was referring to.
Specifically, ‘urgent’ is measured by the difference between the time you have and the time it will take to do. If I need the coffee to be done in 15 minutes and the bread to be done in an hour, but if I want the bread to be done in an hour I need to preheat the oven now (whereas the coffee only takes 10 minutes to brew start to finish) then preheating the oven is urgent whereas brewing the coffee has 5 minutes of float time. If I haven’t started the coffee in 5 minutes, then it becomes urgent. See critical path analysis and Gantt charts and so on.
This might be worth a post? It feels like it’d be low on my queue but might also be easy to write.
It also seems like there’s an argument for weighting urgency in planning that could lead to ‘distorted’ timelines while being a rational response to uncertainty.
It’s important to do the “what are all the possible outcomes and what are the probabilities of each” calculation before you start thinking about weightings of how bad/good various outcomes are.
Could you say more about what you mean here? I don’t quite see the connection between your comment and the point that was quoted.
I understand the quoted bit to be pointing out that if you don’t know when a disaster is coming you _might_ want to prioritize preparing for it coming sooner rather than later (e.g. since there’s a future you who will be available to prepare for the disaster if it comes in the future, but you’re the only you available to prepare for it if it comes tomorrow).
Of course you could make a counter-argument that perhaps you can’t do much of anything in the case where disaster is coming soon, but in the long-run your actions can compound, so you should focus on long-term scenarios. But the quoted bit is only saying that there’s “an argument”, and doesn’t seem to be making a strong claim about which way it comes out in the final analysis.
Was your comment meaning to suggest the possibility of a counter-argument like this one, or something else? Did you interpret the bit you quoted the same way I did?
While there are often good reasons to keep some specific technical details of dangerous technology secret, keeping strategy secret is unwise.
In this comment, by “public” I mean “the specific intellectual public who would be interested in your ideas if you shared them”, not “the general public”. (I’m arguing for transparency, not mass-marketing)
Either you think the public should, in general, have better beliefs about AI strategy, or you think the public should, in general, have worse beliefs about AI strategy, or you think the public should have exactly the level of epistemics about AI strategy that it does.
If you think the public should, in general, have better beliefs about AI strategy: great, have public discussions. Maybe some specific discussions will be net-negative, but others will be net-positive, and the good will outweigh the bad.
If you think the public should, in general, have worse beliefs about AI strategy: unless you have a good argument for this, the public has reason to think you’re not acting in the public interest at this point, and are also likely acting against it.
There are strong prior reasons to think that it’s better for the public to have better beliefs about AI strategy. To the extent that “people doing stupid things” is a risk, that risk comes from people having bad strategic beliefs. Also, to the extent that “people not knowing what each other is going to do and getting scared” is a risk, the risk comes from people not sharing their strategies with each other. It’s common for multiple nations to spy on each other to reduce the kind of information asymmetries that can lead to unnecessary arms races, preemptive strikes, etc.
This doesn’t rule out that there may come a time when there are good public arguments that some strategic topics should stop being discussed publicly. But that time isn’t now.
There are strong prior reasons to think that it’s better for the public to have better beliefs about AI strategy.
That may be, but note that the word “prior” is doing basically all of the work in this sentence. (To see this, just replace “AI strategy” with practically any other subject, and notice how the modified statement sounds just as sensible as the original.) This is important because priors can easily be overwhelmed by additional evidence—and insofar as AI researcher Alice thinks a specific discussion topic in AI strategy has the potential to be dangerous, it’s worth realizing Alice probably has some specific inside view reasons to believe that’s the case. And, if those inside view arguments happen to require an understanding of the topic that Alice believes to be dangerous, then Alice’s hands are now tied: she’s both unable to share information about something, and unable to explain why she can’t share that information.
Naturally, this doesn’t just make Alice’s life more difficult: if you’re someone on the outside looking in, then you have no way of confirming if anything Alice says is true, and you’re forced to resort to just trusting Alice. If you don’t have a whole lot of trust in Alice to begin with, you might assume the worst of her: Alice is either rationalizing or lying (or possibly both) in order to gain status for herself and the field she works in.
I think, however, that these are dangerous assumptions to make. Firstly, if Alice is being honest and rational, then this policy effectively punishes her for being “in the know”—she must either divulge information she (correctly) believes to be dangerous, or else suffer an undeserved reputational hit. I’m particularly wary of imposing incentive structures of this kind around AI safety research, especially considering the relatively small number of people working on AI safety to begin with.
Secondly, however: in addition to being unfair to Alice, there are more subtle effects that such a policy may have. In particular, if Alice feels pressured to disclose the reasons she can’t disclose things, that may end up influencing the rate and/or quality of the research she does in the first place (Ctrl+F “walls”). This could have serious consequences down the line for AI safety research, above and beyond the object-level hazards of revealing potentially dangerous ideas to the public.
Given all of this, I don’t think it’s obvious that the best move at this point involves making all of the strategic arguments around AI safety public. (And note that I say this as a member of said public: I am not affiliated with MIRI or any other AI safety institution, nor am I personally acquainted with anyone who is so affiliated. This therefore makes me a direct counter-example to your claim about the public in general having reason to think secret-keeping organizations must be doing so for self-interested reasons.)
To be clear: I think there is a possible world in which your arguments make sense. I also think there is a possible world in which your arguments not only do not make sense, but would lead to a clearly worse outcome if taken seriously. It’s not clear to me which of these worlds we actually live in, and I don’t think you’ve done a sufficient job of arguing that we live in the former world instead of the latter.
If someone’s claiming “topic X is dangerous to talk about, and I’m not even going to try to convince you of the abstract decision theory implying this, because this decision theory is dangerous to talk about”, I’m not going to believe them, because that’s frankly absurd.
It’s possible to make abstract arguments that don’t reveal particular technical details, such as by referring to historical cases, or talking about hypothetical situations.
It’s also possible for Alice to convince Bob that some info is dangerous by giving the info to Carol, who is trusted by both Alice and Bob, after which Carol tells Bob how dangerous the info is.
If Alice isn’t willing to do any of these things, fine, there’s a possible but highly unlikely world where she’s right, and she takes a reputation hit due to the “unlikely” part of that sentence.
(Note, the alternative hypothesis isn’t just direct selfishness; what’s more likely is cliquish inner ring dynamics)
I haven’t had time to write my thoughts on when strategy research should and shouldn’t be public, but I note that this recent post by Spiracular touches on many of the points that I would touch on in talking about the pros and cons of secrecy around infohazards.
The main claim that I would make about extending this to strategy is that strategy implies details. If I have a strategy that emphasizes that we need to be careful around biosecurity, that implies technical facts about the relative risks of biology and other sciences.
For example, the US developed the Space Shuttle with a justification that didn’t add up (ostensibly it would save money, but it was obvious that it wouldn’t). The Soviets, trusting in the rationality of the US government, inferred that there must be some secret application for which the Space Shuttle was useful, and so developed a clone (so that when the secret application was unveiled, they would be able to deploy it immediately instead of having to build their own shuttle from scratch then). If in fact an application like that had existed, it seems likely that the Soviets could have found it by reasoning through “what do they know that I don’t?” when they might not have found it by reasoning from scratch.
[Note: this, and all comments on this post unless specified otherwise, is written with my ‘LW user’ hat on, not my ‘LW Admin’ or ‘MIRI employee’ hat on, and thus is my personal view instead of the LW view or the MIRI view.]
As someone who thinks about AGI timelines a lot, I find myself dissatisfied with this post because it’s unclear what “The AI Timelines Scam” you’re talking about, and I’m worried if I poke at the bits it’ll feel like a motte and bailey, where it seems quite reasonable to me that ’73% of tech executives thinking that the singularity will arrive in <10 years is probably just inflated ‘pro-tech’ reasoning,′ but also it seems quite unreasonable to suggest that strategic considerations about dual use technology should be discussed openly (or should be discussed openly because tech executives have distorted beliefs). It also seems like there’s an argument for weighting urgency in planning that could lead to ‘distorted’ timelines while being a rational response to uncertainty.
On the first point, I think the following might be a fair description of some thinkers in the AGI space, but don’t think this is a fair summary of MIRI (and I think it’s illegible, to me at least, whether you are intending this to be a summary of MIRI):
I do think it makes sense to write more publicly about the difficulties of writing publicly, but there’s always going to be something odd about it. Suppose I have 5 reasons for wanting discussions to be private, and 3 of them I can easily say. Discussing those three reasons will give people an incomplete picture that might seem complete, in a way that saying “yeah, the sum of factors is against” won’t. Further, without giving specific examples, it’s hard to see which of the ones that are difficult to say you would endorse and which you wouldn’t, and it’s not obvious to me legibility is the best standard here.
But my simple sense is that openly discussing whether or not nuclear weapons were possible (a technical claim on which people might have private information, including intuitions informed by their scientific experience) would have had costs and it was sensible to be secretive about it. If I think that timelines are short because maybe technology X and technology Y fit together neatly, then publicly announcing that increases the chances that we get short timelines because someone plugs together technology X and technology Y. It does seem like marginal scientists speed things up here.
Now, I’m paying a price here; it may be the case that people have tried to glue together technology X and technology Y and it won’t work. I think private discussions on this are way better than no discussions on this, because it increases the chances that those sorts of crucial facts get revealed. It’s not obvious that public discussions are all that much better on these grounds.
On the second point, it feels important to note that the threshold for “take something seriously” is actually quite small. I might think that the chance that I have Lyme disease is 5%, and yet that motivates significant action because of hugely asymmetric cost considerations, or rapid decrease in efficacy of action. I think there’s often a problem where someone ‘has short timelines’ in the sense that they think 10-year scenarios should be planned about at all, but this can be easily mistaken for ‘they think 10-year scenarios are most likely’ because often if you think both an urgent concern and a distant concern are possible, almost all of your effort goes into the urgent concern instead of the distant concern (as sensible critical-path project management would suggest).
I agree that there are clear costs to making extra arguments of the form “timelines are short because technology X and technology Y will fit together neatly”. However, you could still make public that your timelines are a given probability distribution D, and the reasons which led you to that conclusion are Z% object-level views which you won’t share, and (100-Z)% base rate reasoning and other outside-view considerations, which you will share.
I think there are very few costs to declaring which types of reasoning you’re most persuaded by. There are some costs to actually making the outside-view reasoning publicly available—maybe people who read it will better understand the AI landscape and use that information to do capabilities research.
But having a lack of high-quality public timelines discussion also imposes serious costs, for a few reasons:
1. It means that safety researchers are more likely to be wrong, and therefore end up doing less relevant research. I am generally pretty skeptical of reasoning that hasn’t been written down and undergone public scrutiny.
2. It means there’s a lot of wasted motion across the safety community, as everyone tries to rederive the various arguments involved, and figure out why other people have the views they do, and who they should trust.
3. It makes building common knowledge (and the coordination which that knowledge can be used for) much harder.
4. It harms the credibility of the field of safety from the perspective of outside observers, including other AI researchers.
Also, the more of a risk you think 1 is, the lower the costs of disclosure are, because it becomes more likely that any information gleaned from the disclosure is wrong anyway. Yet predicting the future is incredibly hard! So the base rate for correctness here is low. And I don’t think that safety researchers have a compelling advantage when it comes to correctly modelling how AI will reach human level (compared with thoughtful ML researchers).
Consider, by analogy, a debate two decades ago about whether to make public the ideas of recursive self-improvement and fast takeoff. The potential cost of that is very similar to the costs of disclosure now—giving capabilities researchers these ideas might push them towards building self-improving AIs faster. And yet I think making those arguments public was clearly the right decision. Do you agree that our current situation is fairly analogous?
EDIT: Also, I’m a little confused by
I understand that there are good reasons for discussions to be private, but can you elaborate on why we’d want discussions about privacy to be private?
I mostly agree with your analysis; especially the point about 1 (that the more likely I think my thoughts are to be wrong, the lower cost it is to share them).
Most examples here have the difficulty that I can’t share them without paying the costs, but here’s one that seems pretty normal:
Suppose someone is a student and wants to be hired later as a policy analyst for governments, and believes that governments care strongly about past affiliations and beliefs. Then it might make sense for them to censor themselves in public under their real name because of potential negative consequences of things they said when young. However, any statement of the form “I specifically want to hide my views on X” made under their real name has similar possible negative consequences, because it’s an explicit admission that the person has something to hide.
Currently, people hiding their unpopular opinions to not face career consequences is fairly standard, and so it’s not that damning to say “I think this norm is sensible” or maybe even “I follow this norm,” but it seems like it would have been particularly awkward to be first person to explicitly argue for that norm.
Tangent:
This isn’t obvious to me. And I would be interested in a post laying out the argument, in general or in relation to AI.
The standard cite is Owen CB’s paper Allocating Risk Mitigation Across Time. Here’s one quote on this topic:
The above is a slightly unrepresentative quote; the paper is largely undecided as to whether shorter term strategies or longer term strategies are more valuable (given uncertainty over timelines), and recommends a portfolio approach (running multiple strategies, that each apply to different timelines). But that’s the sort of argument I think Vaniver was referring to.
Specifically, ‘urgent’ is measured by the difference between the time you have and the time it will take to do. If I need the coffee to be done in 15 minutes and the bread to be done in an hour, but if I want the bread to be done in an hour I need to preheat the oven now (whereas the coffee only takes 10 minutes to brew start to finish) then preheating the oven is urgent whereas brewing the coffee has 5 minutes of float time. If I haven’t started the coffee in 5 minutes, then it becomes urgent. See critical path analysis and Gantt charts and so on.
This might be worth a post? It feels like it’d be low on my queue but might also be easy to write.
It’s important to do the “what are all the possible outcomes and what are the probabilities of each” calculation before you start thinking about weightings of how bad/good various outcomes are.
Could you say more about what you mean here? I don’t quite see the connection between your comment and the point that was quoted.
I understand the quoted bit to be pointing out that if you don’t know when a disaster is coming you _might_ want to prioritize preparing for it coming sooner rather than later (e.g. since there’s a future you who will be available to prepare for the disaster if it comes in the future, but you’re the only you available to prepare for it if it comes tomorrow).
Of course you could make a counter-argument that perhaps you can’t do much of anything in the case where disaster is coming soon, but in the long-run your actions can compound, so you should focus on long-term scenarios. But the quoted bit is only saying that there’s “an argument”, and doesn’t seem to be making a strong claim about which way it comes out in the final analysis.
Was your comment meaning to suggest the possibility of a counter-argument like this one, or something else? Did you interpret the bit you quoted the same way I did?
Basically, don’t let your thinking on what is useful affect your thinking on what’s likely.
While there are often good reasons to keep some specific technical details of dangerous technology secret, keeping strategy secret is unwise.
In this comment, by “public” I mean “the specific intellectual public who would be interested in your ideas if you shared them”, not “the general public”. (I’m arguing for transparency, not mass-marketing)
Either you think the public should, in general, have better beliefs about AI strategy, or you think the public should, in general, have worse beliefs about AI strategy, or you think the public should have exactly the level of epistemics about AI strategy that it does.
If you think the public should, in general, have better beliefs about AI strategy: great, have public discussions. Maybe some specific discussions will be net-negative, but others will be net-positive, and the good will outweigh the bad.
If you think the public should, in general, have worse beliefs about AI strategy: unless you have a good argument for this, the public has reason to think you’re not acting in the public interest at this point, and are also likely acting against it.
There are strong prior reasons to think that it’s better for the public to have better beliefs about AI strategy. To the extent that “people doing stupid things” is a risk, that risk comes from people having bad strategic beliefs. Also, to the extent that “people not knowing what each other is going to do and getting scared” is a risk, the risk comes from people not sharing their strategies with each other. It’s common for multiple nations to spy on each other to reduce the kind of information asymmetries that can lead to unnecessary arms races, preemptive strikes, etc.
This doesn’t rule out that there may come a time when there are good public arguments that some strategic topics should stop being discussed publicly. But that time isn’t now.
That may be, but note that the word “prior” is doing basically all of the work in this sentence. (To see this, just replace “AI strategy” with practically any other subject, and notice how the modified statement sounds just as sensible as the original.) This is important because priors can easily be overwhelmed by additional evidence—and insofar as AI researcher Alice thinks a specific discussion topic in AI strategy has the potential to be dangerous, it’s worth realizing Alice probably has some specific inside view reasons to believe that’s the case. And, if those inside view arguments happen to require an understanding of the topic that Alice believes to be dangerous, then Alice’s hands are now tied: she’s both unable to share information about something, and unable to explain why she can’t share that information.
Naturally, this doesn’t just make Alice’s life more difficult: if you’re someone on the outside looking in, then you have no way of confirming if anything Alice says is true, and you’re forced to resort to just trusting Alice. If you don’t have a whole lot of trust in Alice to begin with, you might assume the worst of her: Alice is either rationalizing or lying (or possibly both) in order to gain status for herself and the field she works in.
I think, however, that these are dangerous assumptions to make. Firstly, if Alice is being honest and rational, then this policy effectively punishes her for being “in the know”—she must either divulge information she (correctly) believes to be dangerous, or else suffer an undeserved reputational hit. I’m particularly wary of imposing incentive structures of this kind around AI safety research, especially considering the relatively small number of people working on AI safety to begin with.
Secondly, however: in addition to being unfair to Alice, there are more subtle effects that such a policy may have. In particular, if Alice feels pressured to disclose the reasons she can’t disclose things, that may end up influencing the rate and/or quality of the research she does in the first place (Ctrl+F “walls”). This could have serious consequences down the line for AI safety research, above and beyond the object-level hazards of revealing potentially dangerous ideas to the public.
Given all of this, I don’t think it’s obvious that the best move at this point involves making all of the strategic arguments around AI safety public. (And note that I say this as a member of said public: I am not affiliated with MIRI or any other AI safety institution, nor am I personally acquainted with anyone who is so affiliated. This therefore makes me a direct counter-example to your claim about the public in general having reason to think secret-keeping organizations must be doing so for self-interested reasons.)
To be clear: I think there is a possible world in which your arguments make sense. I also think there is a possible world in which your arguments not only do not make sense, but would lead to a clearly worse outcome if taken seriously. It’s not clear to me which of these worlds we actually live in, and I don’t think you’ve done a sufficient job of arguing that we live in the former world instead of the latter.
If someone’s claiming “topic X is dangerous to talk about, and I’m not even going to try to convince you of the abstract decision theory implying this, because this decision theory is dangerous to talk about”, I’m not going to believe them, because that’s frankly absurd.
It’s possible to make abstract arguments that don’t reveal particular technical details, such as by referring to historical cases, or talking about hypothetical situations.
It’s also possible for Alice to convince Bob that some info is dangerous by giving the info to Carol, who is trusted by both Alice and Bob, after which Carol tells Bob how dangerous the info is.
If Alice isn’t willing to do any of these things, fine, there’s a possible but highly unlikely world where she’s right, and she takes a reputation hit due to the “unlikely” part of that sentence.
(Note, the alternative hypothesis isn’t just direct selfishness; what’s more likely is cliquish inner ring dynamics)
I haven’t had time to write my thoughts on when strategy research should and shouldn’t be public, but I note that this recent post by Spiracular touches on many of the points that I would touch on in talking about the pros and cons of secrecy around infohazards.
The main claim that I would make about extending this to strategy is that strategy implies details. If I have a strategy that emphasizes that we need to be careful around biosecurity, that implies technical facts about the relative risks of biology and other sciences.
For example, the US developed the Space Shuttle with a justification that didn’t add up (ostensibly it would save money, but it was obvious that it wouldn’t). The Soviets, trusting in the rationality of the US government, inferred that there must be some secret application for which the Space Shuttle was useful, and so developed a clone (so that when the secret application was unveiled, they would be able to deploy it immediately instead of having to build their own shuttle from scratch then). If in fact an application like that had existed, it seems likely that the Soviets could have found it by reasoning through “what do they know that I don’t?” when they might not have found it by reasoning from scratch.