I don’t know if that’s possible in full generality, or with absolute confidence, but you can’t really know anything with absolute confidence. Here’s my layman’s understanding/thoughts.
Nevertheless, if one system gives another what it claims is its code+current state, then that system should be able to run the first system in a sandbox on its own hardware, predict the system’s behavior in response to any stimulus, and check that against the real world. If both systems share data this way, they should each be able to interact with the sandboxed version of the other, and then come together later, each knowing how the simulated interaction that the other had already went. They can test this, and continue testing, it by monitoring the other’s behavior over time.
Yes, this would be ‘probabilistic’ and thus this is an issue of evidence that AIs would share with each other.
Why or how would one system trust another that the state (code+data) shared is honest?
Sandboxing is (currently) imperfect, tho perhaps sufficiently advanced AIs could actually achieve it? (On the other hand, there are security vulnerabilities that exploit the ‘computational substrate’, e.g. Spectre, so I would guess that would remain as a potential vulnerability even for AIs that designed and built their own substrates.) This also seems like it would only help if the sandboxed version could be ‘sped up’ and if the AI running the sandboxed AI can ‘convince’ the sandboxed AI that it’s not’ sandboxed.
The ‘prototypical’ AI I’m imagining seems like it would be too ‘big’ and too ‘diffuse’ (e.g. distributed) for it to be able to share (all of) itself with another AI. Another commenter mentioned an AI ‘folding itself up’ for sharing, but I can’t understand concretely how that would help (or how it would work either).
I don’t know if that’s possible in full generality, or with absolute confidence, but you can’t really know anything with absolute confidence. Here’s my layman’s understanding/thoughts.
Nevertheless, if one system gives another what it claims is its code+current state, then that system should be able to run the first system in a sandbox on its own hardware, predict the system’s behavior in response to any stimulus, and check that against the real world. If both systems share data this way, they should each be able to interact with the sandboxed version of the other, and then come together later, each knowing how the simulated interaction that the other had already went. They can test this, and continue testing, it by monitoring the other’s behavior over time.
Some of my own intuitions about this:
Yes, this would be ‘probabilistic’ and thus this is an issue of evidence that AIs would share with each other.
Why or how would one system trust another that the state (code+data) shared is honest?
Sandboxing is (currently) imperfect, tho perhaps sufficiently advanced AIs could actually achieve it? (On the other hand, there are security vulnerabilities that exploit the ‘computational substrate’, e.g. Spectre, so I would guess that would remain as a potential vulnerability even for AIs that designed and built their own substrates.) This also seems like it would only help if the sandboxed version could be ‘sped up’ and if the AI running the sandboxed AI can ‘convince’ the sandboxed AI that it’s not’ sandboxed.
The ‘prototypical’ AI I’m imagining seems like it would be too ‘big’ and too ‘diffuse’ (e.g. distributed) for it to be able to share (all of) itself with another AI. Another commenter mentioned an AI ‘folding itself up’ for sharing, but I can’t understand concretely how that would help (or how it would work either).