Gerald Monroe comments on A Common-Sense Case For Mutually-Misaligned AGIs Allying Against Humans

Gerald Monroe 18 Dec 2023 19:16 UTC
5 points
0
From a systems engineering perspective this is not how you would “play them off against each other”. You are describing 2 arbitrarily powerful ASI models. They have lavish amounts of compute to be considering these acausal trades and negotiations with simulated copies of each other.

This is not how any current software at any scale I am aware of actually works. What you instead do is fling sparse records around. For embedded, it’s structs and flatbufs. For higher level software, protobufs and jsons.

Optimally these requests have no unnecessary fields. A software service then reads a request, generates an output, and then when the transaction is complete, loses all memory of the transaction. This is a “stateless microservice”.

An ASI used in this way doesn’t know the when, it doesn’t know who sent the request, it doesn’t know if it’s being watched, and the ASI model has been distilled down from a bigger version of itself to run in less execution time. So it likely cannot spare any runtime to consider acausal negotiations with itself.

The ASI needs to evaluate this treatment plan for an ICU patient for the next 5 minutes of care and if it misses an error the ASI may receive a later update to correct this.

Milliseconds later the stack has been cleared and the ASI is evaluating the structural stability for a building.

The ASI has no memory of the prior event happening.

I wont claim this is bulletproof but it’s much, much, much harder for the issue you described to be possible at all. I think the mistake the Eliezer and others made when predicting this issue was they didn’t know about model distillation or the precise details of how a service would be implemented. If you imagine a homunculus in a box and it runs all the time, even when there are no requests, and it has awareness of the past requests like a human does, then this issue is a problem.

I am tempted to predict that this entire problem of collusion is science fiction that won’t happen, and instead we will actually find wholly novel ways for ASI systems to fail horribly and kill people.

I am not even claiming ASI systems will turn out to be very safe just that the real world can reveal issues humans never once considered. In real life software there’s entire classes of vulnerabilities that depend on the exact implementation of von neuman architecture computers and these vulnerabilities would be entirely different if CPUs were designed differently.

You could not predict the actual cybersecurity issues if you didn’t understand stack frames and process spaces and how shared libraries are implemented and so on. The exact details are crucial.