Indeed, I didn’t see that or forgot about it—I was pulling my memory when responding. So you might be right that Mayhem is doing RCE.
But what I remember distinctively from the DARPA challenge (also using my memory): their “hacking environment” was a simplified sandbox, not a full CISC with a complex OS.
In the “capture-the-flag” (CTF) hacking competition with humans (at DEFCON 2016), Mayhem was last. This was 6 years ago, we don’t know how good it is now (probably MUCH better, but still, it is a commercial tool used in defense).
The key is harnessing AIs for defense, like finding and fixing all vulnerabilities in a software program before it gets released. “We’d then live in a world where software vulnerabilities were a thing of the past,” he says.
I don’t share that optimism. Why? Complexity creates a combinatorial explosion (to an extremely large search space). What if vulnerabilities are like (bad) chess moves: could AI tell us all mistakes we could make (in advance)? I don’t want to misuse this analogy: but the question is: Is hacking a finite game (with a chance we can remove all vulnerabilities) - or is it an infinite game?
Indeed, I didn’t see that or forgot about it—I was pulling my memory when responding. So you might be right that Mayhem is doing RCE.
But what I remember distinctively from the DARPA challenge (also using my memory): their “hacking environment” was a simplified sandbox, not a full CISC with a complex OS.
In the “capture-the-flag” (CTF) hacking competition with humans (at DEFCON 2016), Mayhem was last. This was 6 years ago, we don’t know how good it is now (probably MUCH better, but still, it is a commercial tool used in defense).
I am more worried about the attack tools developed behind close doors. I read recently about Bruce Schneier (“When AI Becomes the Hacker”—May 14, 21):
I don’t share that optimism. Why? Complexity creates a combinatorial explosion (to an extremely large search space). What if vulnerabilities are like (bad) chess moves: could AI tell us all mistakes we could make (in advance)? I don’t want to misuse this analogy: but the question is: Is hacking a finite game (with a chance we can remove all vulnerabilities) - or is it an infinite game?