habryka comments on Habryka’s Shortform Feed

habryka 23 Nov 2024 5:50 UTC
22 points
2
Here is a thing that I think would be cool to analyze sometime: How difficult would it have been for AI systems to discover and leverage historical hardware-level vulnerabilities, assuming we had not discovered them yet. Like, it seems worth an analysis to understand how difficult things like rowhammer, or more recent speculative execution bugs like Spectre and Meltdown would have been to discover, and how useful they would have been. It’s not an easy analysis, but I can imagine the answer coming out obviously one way or another if one engaged seriously with the underlying issue.
- MondSemmel 23 Nov 2024 10:07 UTC
  6 points
  0
  Parent
  How would you avoid the data contamination issue where the AI system has been trained on the entire Internet and thus already knows about all of these vulnerabilities?
  - Marcus Williams 23 Nov 2024 16:03 UTC
    3 points
    2
    Parent
    I suppose you could use models trained before vulnerabilities happen?
    - Archimedes 24 Nov 2024 21:06 UTC
      1 point
      0
      Parent
      Aren’t most of these famous vulnerabilities from before modern LLMs existed and thus part of their training data?
      - Marcus Williams 24 Nov 2024 21:24 UTC
        1 point
        0
        Parent
        Sure, but does a vulnerability need to be famous to be useful information? I imagine there are many vulnerabilities on a spectrum from minor to severe and from almost unknown to famous?
- Yudhister Kumar 23 Nov 2024 7:31 UTC
  3 points
  2
  Parent
  (very naive take) I would suspect this is medium-easily automatable by making detailed enough specs of existing hardware systems & bugs in them, or whatever (maybe synthetically generate weak systems with semi-obvious bugs and train on transcripts which allows generalization to harder ones). it also seems like the sort of thing that is particularly susceptible to AI >> human; the difficulty here is generating the appropriate data & the languages for doing so already exist ?
- lc 23 Nov 2024 21:40 UTC
  2 points
  0
  Parent
  Why hardware bugs in particular?
- gyfwehbdkch 23 Nov 2024 16:20 UTC
  1 point
  0
  Parent
  Can AI hack into LessWrong’s database?
  
  This seems like a strictly easier task than discovering rowhammer or spectre.
  (The hard part is discovering the vulnerability, not writing the code for the exploit assuming you had a one paragraph description.)
  Have you read the wikipedia pages for these attacks? My intuition is they require first principles thinking to discover, you’re unlikely to stumble on them simply by generating a lot of data from the processor and searching for patterns in the data.