Here is a thing that I think would be cool to analyze sometime: How difficult would it have been for AI systems to discover and leverage historical hardware-level vulnerabilities, assuming we had not discovered them yet. Like, it seems worth an analysis to understand how difficult things like rowhammer, or more recent speculative execution bugs like Spectre and Meltdown would have been to discover, and how useful they would have been. It’s not an easy analysis, but I can imagine the answer coming out obviously one way or another if one engaged seriously with the underlying issue.
How would you avoid the data contamination issue where the AI system has been trained on the entire Internet and thus already knows about all of these vulnerabilities?
Sure, but does a vulnerability need to be famous to be useful information? I imagine there are many vulnerabilities on a spectrum from minor to severe and from almost unknown to famous?
(very naive take) I would suspect this is medium-easily automatable by making detailed enough specs of existing hardware systems & bugs in them, or whatever (maybe synthetically generate weak systems with semi-obvious bugs and train on transcripts which allows generalization to harder ones). it also seems like the sort of thing that is particularly susceptible to AI >> human; the difficulty here is generating the appropriate data & the languages for doing so already exist ?
This seems like a strictly easier task than discovering rowhammer or spectre.
(The hard part is discovering the vulnerability, not writing the code for the exploit assuming you had a one paragraph description.)
Have you read the wikipedia pages for these attacks? My intuition is they require first principles thinking to discover, you’re unlikely to stumble on them simply by generating a lot of data from the processor and searching for patterns in the data.
Here is a thing that I think would be cool to analyze sometime: How difficult would it have been for AI systems to discover and leverage historical hardware-level vulnerabilities, assuming we had not discovered them yet. Like, it seems worth an analysis to understand how difficult things like rowhammer, or more recent speculative execution bugs like Spectre and Meltdown would have been to discover, and how useful they would have been. It’s not an easy analysis, but I can imagine the answer coming out obviously one way or another if one engaged seriously with the underlying issue.
How would you avoid the data contamination issue where the AI system has been trained on the entire Internet and thus already knows about all of these vulnerabilities?
I suppose you could use models trained before vulnerabilities happen?
Aren’t most of these famous vulnerabilities from before modern LLMs existed and thus part of their training data?
Sure, but does a vulnerability need to be famous to be useful information? I imagine there are many vulnerabilities on a spectrum from minor to severe and from almost unknown to famous?
(very naive take) I would suspect this is medium-easily automatable by making detailed enough specs of existing hardware systems & bugs in them, or whatever (maybe synthetically generate weak systems with semi-obvious bugs and train on transcripts which allows generalization to harder ones). it also seems like the sort of thing that is particularly susceptible to AI >> human; the difficulty here is generating the appropriate data & the languages for doing so already exist ?
Why hardware bugs in particular?
Can AI hack into LessWrong’s database?
This seems like a strictly easier task than discovering rowhammer or spectre.
(The hard part is discovering the vulnerability, not writing the code for the exploit assuming you had a one paragraph description.)
Have you read the wikipedia pages for these attacks? My intuition is they require first principles thinking to discover, you’re unlikely to stumble on them simply by generating a lot of data from the processor and searching for patterns in the data.