Thanks for the comment! I don’t think you are interpreting this post wrong.
Why would we want to run it if we can’t see what it does?
I don’t think that running a superintelligence is ever useful. What I argue is that running it inside a cryptographic box is better than running it outside of the box. The goal of this project is that, if a superintelligence is run, then it doesn’t automatically destroy the world. The goal isn’t to run a superintelligence to do useful tasks.
Like maybe for n=216, it would not be enough to persuade a person to do something that would unbox the AI (but it might).
Personally, I think that making it output 216 bits of information is too much. If the superintelligence were to output an ASCII text (which is very inefficient in order to unbox itself), then 216 bits would correspond to 8192 characters, which seems way more than enough in order to convince someone to unbox it. If the decrypted bits were thoroughly examined by people outside of the box, then I expect that n=256 would be too much. It would be interesting to know how much information we are able to communicate with that many bits (it makes me think about the Arecibo message, which uses 1679 bits).
Since I think that running a superintelligence is already too dangerous even without intending to decrypt anything, I wouldn’t approve a team that tries to build a superintelligence with n=0. However, I think that planning n=0 is better than not trying to use a cryptographic box at all.
I don’t think having a formal proof should be an objective in and of itself.
I think the objective should be to mitigate the risk of extinction by AI. Developing an unconditional proof of safety seems to be one way to reduce this risk, and this is why I decided to work on cryptographic boxes. However, we shouldn’t rely only on trying to develop unconditional proofs, since these seem possible only when talking about control (how to prevent a superintelligence from achieving its bad objectives), but they seem nearly impossible to do when talking about alignment (how to give good objectives to a superintelligence). Moreover, working on alignment seems more promising than working on control in the cases where we have a lot of time until the first superintelligence.
If we don’t allow any output bits to come out of the ASI, then how do we know if it worked?
I think that, if the box were to have holes, then we would realize it pretty soon from the fact that the superintelligence would dramatically change the world around us, and probably make Earth uninhabitable in this process.
Thanks for the comment! I don’t think you are interpreting this post wrong.
I don’t think that running a superintelligence is ever useful. What I argue is that running it inside a cryptographic box is better than running it outside of the box. The goal of this project is that, if a superintelligence is run, then it doesn’t automatically destroy the world. The goal isn’t to run a superintelligence to do useful tasks.
Personally, I think that making it output 216 bits of information is too much. If the superintelligence were to output an ASCII text (which is very inefficient in order to unbox itself), then 216 bits would correspond to 8192 characters, which seems way more than enough in order to convince someone to unbox it. If the decrypted bits were thoroughly examined by people outside of the box, then I expect that n=256 would be too much. It would be interesting to know how much information we are able to communicate with that many bits (it makes me think about the Arecibo message, which uses 1679 bits).
Since I think that running a superintelligence is already too dangerous even without intending to decrypt anything, I wouldn’t approve a team that tries to build a superintelligence with n=0. However, I think that planning n=0 is better than not trying to use a cryptographic box at all.
I think the objective should be to mitigate the risk of extinction by AI. Developing an unconditional proof of safety seems to be one way to reduce this risk, and this is why I decided to work on cryptographic boxes. However, we shouldn’t rely only on trying to develop unconditional proofs, since these seem possible only when talking about control (how to prevent a superintelligence from achieving its bad objectives), but they seem nearly impossible to do when talking about alignment (how to give good objectives to a superintelligence). Moreover, working on alignment seems more promising than working on control in the cases where we have a lot of time until the first superintelligence.
I think that, if the box were to have holes, then we would realize it pretty soon from the fact that the superintelligence would dramatically change the world around us, and probably make Earth uninhabitable in this process.