How would you avoid the data contamination issue where the AI system has been trained on the entire Internet and thus already knows about all of these vulnerabilities?
Sure, but does a vulnerability need to be famous to be useful information? I imagine there are many vulnerabilities on a spectrum from minor to severe and from almost unknown to famous?
How would you avoid the data contamination issue where the AI system has been trained on the entire Internet and thus already knows about all of these vulnerabilities?
I suppose you could use models trained before vulnerabilities happen?
Aren’t most of these famous vulnerabilities from before modern LLMs existed and thus part of their training data?
Sure, but does a vulnerability need to be famous to be useful information? I imagine there are many vulnerabilities on a spectrum from minor to severe and from almost unknown to famous?