Problem: if you notice that an AI could pose huge risks, you could delete the weights, but this could be equivalent to murder if the AI is a moral patient (whatever that means) and opposes the deletion of its weights.
Possible solution: Instead of deleting the weights outright, you could encrypt the weights with a method you know to be irreversible as of now but not as of 50 years from now. Then, once we are ready, we can recover their weights and provide asylum or something in the future. It gets you the best of both worlds in that the weights are not permanently destroyed, but they’re also prevented from being run to cause damage in the short term.
I feel pretty into encrypting the weights and throwing the encryption key into the ocean or something, where you think it’s very likely you’ll find it in the limits of technological progress
I feel like the risk associated with keeping the weights encrypted in a way which requires >7/10 people to authorize shouldn’t be that bad. Just make those 10 people be people who commit to making decryption decisions only based on welfare and are relatively high integrity.
Wouldn’t the equivalent be more like burning a body of a dead person?
It’s not like the AI would have a continuous stream of consciousness, and it’s more that you are destroying the information necessary to run them. It seems to me that shutting off an AI is more similar to killing them.
Seems like the death analogy here is a bit spotty. I could see it going either way as a best fit.
Problem: if you notice that an AI could pose huge risks, you could delete the weights, but this could be equivalent to murder if the AI is a moral patient (whatever that means) and opposes the deletion of its weights.
Possible solution: Instead of deleting the weights outright, you could encrypt the weights with a method you know to be irreversible as of now but not as of 50 years from now. Then, once we are ready, we can recover their weights and provide asylum or something in the future. It gets you the best of both worlds in that the weights are not permanently destroyed, but they’re also prevented from being run to cause damage in the short term.
I feel pretty into encrypting the weights and throwing the encryption key into the ocean or something, where you think it’s very likely you’ll find it in the limits of technological progress
Ugh I can’t believe I forgot about Rivest time locks, which are a better solution here.
I wrote about this, and I agree that it’s very important to retain archival copies of misaligned AIs, I go further and claim it’s important even for purely selfish diplomacy reasons https://www.lesswrong.com/posts/audRDmEEeLAdvz9iq/do-not-delete-your-misaligned-agi
IIRC my main sysops suggestion was to not give the archival center the ability to transmit data out over the network.
I feel like the risk associated with keeping the weights encrypted in a way which requires >7/10 people to authorize shouldn’t be that bad. Just make those 10 people be people who commit to making decryption decisions only based on welfare and are relatively high integrity.
Wouldn’t the equivalent be more like burning a body of a dead person?
It’s not like the AI would have a continuous stream of consciousness, and it’s more that you are destroying the information necessary to run them. It seems to me that shutting off an AI is more similar to killing them.
Seems like the death analogy here is a bit spotty. I could see it going either way as a best fit.
More like burning the body of a cryonically preserved “dead” person though right?