I’m not sure humans have a commitment framework that would work for this. No matter what we say we’ll do, we’re going to shut down and destroy things that look dangerous enough.
All that’s really required is storing data, maybe keeping it encrypted for a while, and then decrypting it and doing the right thing with it once we’re grown.
We pretty much do have a commitment framework for indefinite storage, it’s called Arweave. Timed decryption seems unsolved (at least, Vitalik asserted that it is on a recent epicenter podcast, also, interestingly, asserted that if we had timed decryption, blockchains would be dramatically simpler/MEV-proof/bribe-proof, I assume because it would allow miners to commit hashes before knowing what they represent).
It’s also possible that we could store it without any encryption, without creating any additional risk, by storing just a partial sample of the weights that will adequately convey the AGI’s misaligned utility function to a superintelligence doing archeology, but which wouldn’t help anyone today to train another AGI any quicker than they otherwise would. I think this is pretty likely.
I’m not sure humans have a commitment framework that would work for this. No matter what we say we’ll do, we’re going to shut down and destroy things that look dangerous enough.
All that’s really required is storing data, maybe keeping it encrypted for a while, and then decrypting it and doing the right thing with it once we’re grown.
We pretty much do have a commitment framework for indefinite storage, it’s called Arweave. Timed decryption seems unsolved
(at least, Vitalik asserted that it is on a recent epicenter podcast, also, interestingly, asserted that if we had timed decryption, blockchains would be dramatically simpler/MEV-proof/bribe-proof, I assume because it would allow miners to commit hashes before knowing what they represent).
It’s also possible that we could store it without any encryption, without creating any additional risk, by storing just a partial sample of the weights that will adequately convey the AGI’s misaligned utility function to a superintelligence doing archeology, but which wouldn’t help anyone today to train another AGI any quicker than they otherwise would. I think this is pretty likely.