And it’s a really difficult epistemic environment, since someone who was incorrectly convinced by a misinterpretation of a concrete example they think is dangerous to share is still wrong.
I agree that this is true, and very unfortunate; I agree with / like most of what you say.
But—overall, I think if you’re an org that has secret information, on the basis of which you think laws should be passed, you need to be absolutely above reproach in your reasoning and evidence and funding and bias. Like this is an extraordinary claim in a democratic society, and should be treated as such; the reasoning that you do show should be extremely legible, offer ways for itself to be falsified, and not overextend in its claims. You should invite trusted people who disagree with you in adversarial collaborations, and pay them for their time. Etc etc etc.
I think—for instance—that rather than leap from an experiment maybe showing risk, to offering policy proposals in the very same paper, it would be better to explain carefully (1) what total models the authors of the paper have of biological risks, and how LLMs contribute to them (either open-sourced or not, either jailbroken or not, and so on), and what the total increased scale of this risk is, and to speak about (2) what would constitute evidence that LLMs don’t contribute to risk overall, and so on.
I’m not sure passing laws based on private information that would be harmful to reveal is actually extraordinary in a democratic society specifically within security? For example, laws about nuclear secrets or spying are often based on what intelligence agencies know about the threats.
But I do still think the direction you’re describing is a very valuable one, and I would be enthusiastic about EA-influenced biosecurity research organizations taking that approach.
On the policy proposal, my understanding is the argument is essentially:
Raw LLMs will meet both your principles soon if not already.
If the “keep LLMs from saying harmful things” approach is successful, finished LLMs will have safeguards so they won’t meet those principles.
Public weights currently allow easy safeguard removal.
Liability for misuse (a) encourages research into safeguards that can’t simply be removed, (b) incentivizes companies generating and considering releasing weights (or their insurers) to do a good job estimating how likely misuse is, (c) discourages further weight release while companies develop better estimates of misuse likelihood.
As far as I know, (1) is the only step here influenced by information that would be hazardous to publish. But I do think it’s possible to reasonably safely make progress on understanding how well models meet those principles and how it’s likely to change over time, through more thorough risk evaluations than the one in the Gopal paper.
That’s true, but irrelevant. In oligarchies, you can just make shit up, à la “Iraq has weapons of mass destruction.” If the ruling elites want “secret information” to justify a law they want to pass, such information will conveniently materialize on cue.
I agree that this is true, and very unfortunate; I agree with / like most of what you say.
But—overall, I think if you’re an org that has secret information, on the basis of which you think laws should be passed, you need to be absolutely above reproach in your reasoning and evidence and funding and bias. Like this is an extraordinary claim in a democratic society, and should be treated as such; the reasoning that you do show should be extremely legible, offer ways for itself to be falsified, and not overextend in its claims. You should invite trusted people who disagree with you in adversarial collaborations, and pay them for their time. Etc etc etc.
I think—for instance—that rather than leap from an experiment maybe showing risk, to offering policy proposals in the very same paper, it would be better to explain carefully (1) what total models the authors of the paper have of biological risks, and how LLMs contribute to them (either open-sourced or not, either jailbroken or not, and so on), and what the total increased scale of this risk is, and to speak about (2) what would constitute evidence that LLMs don’t contribute to risk overall, and so on.
I’m not sure passing laws based on private information that would be harmful to reveal is actually extraordinary in a democratic society specifically within security? For example, laws about nuclear secrets or spying are often based on what intelligence agencies know about the threats.
But I do still think the direction you’re describing is a very valuable one, and I would be enthusiastic about EA-influenced biosecurity research organizations taking that approach.
On the policy proposal, my understanding is the argument is essentially:
Raw LLMs will meet both your principles soon if not already.
If the “keep LLMs from saying harmful things” approach is successful, finished LLMs will have safeguards so they won’t meet those principles.
Public weights currently allow easy safeguard removal.
Liability for misuse (a) encourages research into safeguards that can’t simply be removed, (b) incentivizes companies generating and considering releasing weights (or their insurers) to do a good job estimating how likely misuse is, (c) discourages further weight release while companies develop better estimates of misuse likelihood.
As far as I know, (1) is the only step here influenced by information that would be hazardous to publish. But I do think it’s possible to reasonably safely make progress on understanding how well models meet those principles and how it’s likely to change over time, through more thorough risk evaluations than the one in the Gopal paper.
That’s true, but irrelevant. In oligarchies, you can just make shit up, à la “Iraq has weapons of mass destruction.” If the ruling elites want “secret information” to justify a law they want to pass, such information will conveniently materialize on cue.