In my view the field of cybersecurity currently is very far from what “theoretically perfect security” would look like. I am not sure how much ahead private knowledge is on the topic, but publicly cybersecurity seems to focus on defending against security holes already demonstrated to be exploitable, and providing some probabilistic defense against some other ones as well. This seems to work well in practice, I don’t know why though. (Maybe highly motivated threat actors with sufficient resources simply don’t exist?)
Conventional approaches work well if your adversary is limited, but Eliezer gives good arguments for why alignment orgs should seriously beef up security.
If you take this to the extreme and allow an unrestricted attacker (like an AGI), you pretty much turn security into the builder-breaker game. Let me demonstrate what I mean by playing it against myself:
B: We have a private Github project.
A: You don’t know anything about GitHub, I could easily just hack them or have an insider at Microsoft.
B: Okay, we will self host a GitLab instance.
A: Where are you hosting it, Azure? Remember, I have an insider at MS. (And at every other cloud provider as well.)
B: Okay, we will buy our own server, stick it in a colocated data center.
A: How are you buying the server? I have active surveillance on you and can tamper with any package you receive. I will install a hardware backdoor in the server.
B: Fine, I will buy Raspberry Pi, and carefully inspect it for any hardware modifications, and I will compare every trace on the PCB to the schematics.
A: I have an insider at Broadcom giving me access to their hardware backdoor built into all of their SoCs.
B: Hm… ok, we will use a RISC-V SoC instead that is completely open source.
A: Back to the software side: GitLab is a large piece of software, you don’t seriously believe that it doesn’t have any 0-days?
B: True. Well GitLab is far too large to properly audit, so I will just sandbox the whole thing. I will put it in its own container, and only allow it to be accessed over our VPN.
A: Hm… well I can backdoor GitLab itself as well. I will coerce the maintainer of some obscure Go package that GitLab depends on to add my backdoor.
B: Well, but I can just restrict internet access to the GitLab container, what good is your backdoor then?
B: Easy, just wrap the whole server in aluminium foil. Anything else?
A: We are not done with software yet. Linux has remotely exploitable 0-days as well. And good luck auditing the whole Linux codebase.
B: Okay, we will switch to Xen. (Or somehow turn seL4 into a hypervisor.) One domain runs GitLab with Linux without network access, another our custom audited ring 0 WireGuard stack.
A: Fine, let’s assume for now that your software is secure. So have you heard of this company called TSMC that produces most of the world’s chips...
And so on. Just like for alignment, I don’t really know if this game can be won by the builder.
Yeah, I know you are looking for more practical advice here, that’s why I posted this as a comment instead of an answer.
Eventually someone will have to aim for the “Excellent” level though (even if not against humans, surely against an AGI), and I just wanted to highlight that this is very much an unsolved problem.
In my view the field of cybersecurity currently is very far from what “theoretically perfect security” would look like. I am not sure how much ahead private knowledge is on the topic, but publicly cybersecurity seems to focus on defending against security holes already demonstrated to be exploitable, and providing some probabilistic defense against some other ones as well. This seems to work well in practice, I don’t know why though. (Maybe highly motivated threat actors with sufficient resources simply don’t exist?)
Conventional approaches work well if your adversary is limited, but Eliezer gives good arguments for why alignment orgs should seriously beef up security.
If you take this to the extreme and allow an unrestricted attacker (like an AGI), you pretty much turn security into the builder-breaker game. Let me demonstrate what I mean by playing it against myself:
B: We have a private Github project.
A: You don’t know anything about GitHub, I could easily just hack them or have an insider at Microsoft.
B: Okay, we will self host a GitLab instance.
A: Where are you hosting it, Azure? Remember, I have an insider at MS. (And at every other cloud provider as well.)
B: Okay, we will buy our own server, stick it in a colocated data center.
A: How are you buying the server? I have active surveillance on you and can tamper with any package you receive. I will install a hardware backdoor in the server.
B: Fine, I will buy Raspberry Pi, and carefully inspect it for any hardware modifications, and I will compare every trace on the PCB to the schematics.
A: I have an insider at Broadcom giving me access to their hardware backdoor built into all of their SoCs.
B: Hm… ok, we will use a RISC-V SoC instead that is completely open source.
A: Back to the software side: GitLab is a large piece of software, you don’t seriously believe that it doesn’t have any 0-days?
B: True. Well GitLab is far too large to properly audit, so I will just sandbox the whole thing. I will put it in its own container, and only allow it to be accessed over our VPN.
A: Hm… well I can backdoor GitLab itself as well. I will coerce the maintainer of some obscure Go package that GitLab depends on to add my backdoor.
B: Well, but I can just restrict internet access to the GitLab container, what good is your backdoor then?
A: Right, right. Well… have you heard that CPUs can be used as radio transmitters?
B: Easy, just wrap the whole server in aluminium foil. Anything else?
A: We are not done with software yet. Linux has remotely exploitable 0-days as well. And good luck auditing the whole Linux codebase.
B: Okay, we will switch to Xen. (Or somehow turn seL4 into a hypervisor.) One domain runs GitLab with Linux without network access, another our custom audited ring 0 WireGuard stack.
A: Fine, let’s assume for now that your software is secure. So have you heard of this company called TSMC that produces most of the world’s chips...
And so on. Just like for alignment, I don’t really know if this game can be won by the builder.
I know. But I’m not aiming for the Excellent level.
Yeah, I know you are looking for more practical advice here, that’s why I posted this as a comment instead of an answer.
Eventually someone will have to aim for the “Excellent” level though (even if not against humans, surely against an AGI), and I just wanted to highlight that this is very much an unsolved problem.
Agree