“Fight” means violence. Why Eliezer states that ASI systems will be able to negotiate deals and avoid fighting, humans cannot (and probably should not) negotiate any deals with rogue superintelligences that have escaped human custody. Even though, like terrorists, they would be able to threaten to kill a non zero number of humans. Deals can’t be negotiated in that ants can’t read a contract between humans [Eliezer’s reasoning], and because there is nothing to discuss with a hostile escaped superintelligence, it only matters who has more and better weapons, because any deals will result in later defection [Geohot’s reasoning].
Because humans can use precursor AI systems constructed before the ASI, it’s like the batmobile analogy. The batmobile is this dense, durable tank loaded with horsepower, armor, and weapons that is beyond anything the police or military have. In a realistic world the military would be able to deploy their own military vehicles using the same tech base and hunt down the criminal and serial night-time assaulter Bruce Wayne. Similarly, if the ASI gets drones and laser weapons, the humans will have their own such capabilities, just slightly worse because the AI systems in service to humans are not as intelligent.
What Geohot is talking about here—formally proven software—can be used to make software secure against any possible input utilizing a class of bug. If you secure the software for all classes of error that are possible, the resulting artifact will not be “pwnable” by any technical means, regardless of the intelligence or capability of the attacker.
Geohot notes that he had a lot of problems with it when he tried it, and it’s an absurdly labor intensive process to do. But theoretically, if cyberattacks from escaped ASI were your threat model, this is what you would do in response. Task AIs with module by module translating all your software to what you meant in a formal definition, with human inspection and review, and then use captive ASIs, such as another version of the same machine that escaped, to attempt to breach the software. The ASI red team gets read access to the formal source code and compiler, your goal is to make software where this doesn’t matter, no untrusted input through any channel can compromise the system.
Here’s a nice simple example on wikipedia: https://en.wikipedia.org/wiki/Dafny . Note that this type of formal language, where it gets translated to another language using an insecure compiler, would probably not withstand ASI level cyberattacks. You would need to rewrite the compilers and tighten the spec of the target language you are targeting.
It seems like this only guarantees security along some particular vector. Which might be why current software doesn’t actually use this type of security.
And if you did actually close off software security as a threat model from ASI, wouldn’t it just choose a different, physical attack mode?
The vectors are classes of software error. Since ultimately it is all binary messages between computers, it is likely possible to build a robust set of solvers that covers all classes of software error that the underlying programming language permits, resulting in code that cannot be compromised by any possible binary message.
And if you did actually close off software security as a threat model from ASI, wouldn’t it just choose a different, physical attack mode?
Yes. It becomes a battle between [ASI with robotically wielded weapons] and [humans plus weaker, more controllable ASI with robotically wielded weapons].
All computer systems are actually composed of hardware, and hardware is much messier than the much simpler abstractions that we call software. There are many real-world exploits that from a software point of view can’t possibly work, but do in fact work because all abstractions are leaky and no hardware perfectly implements idealized program behaviour in reality.
Getting stuff formally specified is insanely difficult, thus unpractical, thus pervasive verified software is impossible without some superhuman help. Here we go again.
“This begged the obvious question—why can’t humans use their previous weaker AIs they already built to fight back”
Would you call this the Antivirus model of AI safety?
“Fight” means violence. Why Eliezer states that ASI systems will be able to negotiate deals and avoid fighting, humans cannot (and probably should not) negotiate any deals with rogue superintelligences that have escaped human custody. Even though, like terrorists, they would be able to threaten to kill a non zero number of humans. Deals can’t be negotiated in that ants can’t read a contract between humans [Eliezer’s reasoning], and because there is nothing to discuss with a hostile escaped superintelligence, it only matters who has more and better weapons, because any deals will result in later defection [Geohot’s reasoning].
Because humans can use precursor AI systems constructed before the ASI, it’s like the batmobile analogy. The batmobile is this dense, durable tank loaded with horsepower, armor, and weapons that is beyond anything the police or military have. In a realistic world the military would be able to deploy their own military vehicles using the same tech base and hunt down the criminal and serial night-time assaulter Bruce Wayne. Similarly, if the ASI gets drones and laser weapons, the humans will have their own such capabilities, just slightly worse because the AI systems in service to humans are not as intelligent.
Even if that’s the case, the amount of 0-days out there (and just generally shitty infosec landscape) is enough to pwn almost any valuable target.
While I’d appreciate some help to screen out the spammers and griefers, this doesn’t make me feel safe existentially.
What Geohot is talking about here—formally proven software—can be used to make software secure against any possible input utilizing a class of bug. If you secure the software for all classes of error that are possible, the resulting artifact will not be “pwnable” by any technical means, regardless of the intelligence or capability of the attacker.
Geohot notes that he had a lot of problems with it when he tried it, and it’s an absurdly labor intensive process to do. But theoretically, if cyberattacks from escaped ASI were your threat model, this is what you would do in response. Task AIs with module by module translating all your software to what you meant in a formal definition, with human inspection and review, and then use captive ASIs, such as another version of the same machine that escaped, to attempt to breach the software. The ASI red team gets read access to the formal source code and compiler, your goal is to make software where this doesn’t matter, no untrusted input through any channel can compromise the system.
Here’s a nice simple example on wikipedia: https://en.wikipedia.org/wiki/Dafny . Note that this type of formal language, where it gets translated to another language using an insecure compiler, would probably not withstand ASI level cyberattacks. You would need to rewrite the compilers and tighten the spec of the target language you are targeting.
It seems like this only guarantees security along some particular vector. Which might be why current software doesn’t actually use this type of security.
And if you did actually close off software security as a threat model from ASI, wouldn’t it just choose a different, physical attack mode?
https://github.com/project-everest/mitls-fstar
It mostly doesn’t.
The vectors are classes of software error. Since ultimately it is all binary messages between computers, it is likely possible to build a robust set of solvers that covers all classes of software error that the underlying programming language permits, resulting in code that cannot be compromised by any possible binary message.
Yes. It becomes a battle between [ASI with robotically wielded weapons] and [humans plus weaker, more controllable ASI with robotically wielded weapons].
All computer systems are actually composed of hardware, and hardware is much messier than the much simpler abstractions that we call software. There are many real-world exploits that from a software point of view can’t possibly work, but do in fact work because all abstractions are leaky and no hardware perfectly implements idealized program behaviour in reality.
Getting stuff formally specified is insanely difficult, thus unpractical, thus pervasive verified software is impossible without some superhuman help. Here we go again.
Even going from “one simple spec” to “two simple spec” is a huge complexity jump: https://www.hillelwayne.com/post/spec-composition/
And real-world software has a huge state envelope.