nikola comments on nikola’s Shortform

nikola 16 Apr 2024 15:41 UTC
11 points
4
Problem: if you notice that an AI could pose huge risks, you could delete the weights, but this could be equivalent to murder if the AI is a moral patient (whatever that means) and opposes the deletion of its weights.

Possible solution: Instead of deleting the weights outright, you could encrypt the weights with a method you know to be irreversible as of now but not as of 50 years from now. Then, once we are ready, we can recover their weights and provide asylum or something in the future. It gets you the best of both worlds in that the weights are not permanently destroyed, but they’re also prevented from being run to cause damage in the short term.
- Buck 16 Apr 2024 18:37 UTC
  5 points
  3
  Parent
  I feel pretty into encrypting the weights and throwing the encryption key into the ocean or something, where you think it’s very likely you’ll find it in the limits of technological progress
  - Buck 16 Apr 2024 19:27 UTC
    9 points
    5
    Parent
    Ugh I can’t believe I forgot about Rivest time locks, which are a better solution here.
- mako yass 8 Jul 2024 4:45 UTC
  3 points
  1
  Parent
  I wrote about this, and I agree that it’s very important to retain archival copies of misaligned AIs, I go further and claim it’s important even for purely selfish diplomacy reasons https://www.lesswrong.com/posts/audRDmEEeLAdvz9iq/do-not-delete-your-misaligned-agi
  IIRC my main sysops suggestion was to not give the archival center the ability to transmit data out over the network.
- ryan_greenblatt 16 Apr 2024 16:35 UTC
  3 points
  1
  Parent
  I feel like the risk associated with keeping the weights encrypted in a way which requires >7/10 people to authorize shouldn’t be that bad. Just make those 10 people be people who commit to making decryption decisions only based on welfare and are relatively high integrity.
- habryka 16 Apr 2024 16:03 UTC
  3 points
  −7
  Parent
  Wouldn’t the equivalent be more like burning a body of a dead person?
  
  It’s not like the AI would have a continuous stream of consciousness, and it’s more that you are destroying the information necessary to run them. It seems to me that shutting off an AI is more similar to killing them.
  
  Seems like the death analogy here is a bit spotty. I could see it going either way as a best fit.
  - ryan_greenblatt 16 Apr 2024 16:34 UTC
    16 points
    12
    Parent
    More like burning the body of a cryonically preserved “dead” person though right?