Quintin Pope comments on Quintin Pope’s Shortform

Quintin Pope 28 Mar 2023 3:34 UTC
10 points
0
One research direction I’d like to see is basically “extracting actionable intel from language model malware”.
I expect that the future will soon contain self-replicating AI viruses, that come with (small) LMs finetuned for hacking purposes, so that the malware can more flexibly adapt to whatever network it’s trying to breach and recognize a wider range of opportunities for extracting money from victims.
Given this, I expect there’s value in having some tools ready to help dissect the behavioral patterns of malware LMs, which might reveal important information about how they were trained, how they attack targets, how their C&C infrastructure works, and how to subvert that infrastructure.