This post describes an interesting mashup of homomorphic encryption and neural networks. I think it is an neat idea and appreciate the effort to put together a demo. Perhaps there will be useful applications.
However, I think the suggestion that this could be an answer to the AI control problem is wrong. First, a superintelligent deep learning AI would not be a safe AI because we would not be able to reason about its utility function. If you are meaning that the same idea could be applied to a different kind of AI so that you would have an oracle AI for which a secret key was needed to read its outputs. I don’t think this helps. You have created a box for the oracle AI, however the problem remains that a superintelligence can probably escape from the box either by convincing you to let it out or by some less direct means that you can’t foresee.
In combination with an AI with social skills that are fundamentally stunted in some way, this might actually work. If the AI cannot directly interface with the world in any meaningful way without the key and it doesn’t have any power to persuade a human actor to supply it with the key, it’s pretty much trapped ( unless there is some way for it to break its own encryption ).
Edit: notwithstanding the possibility that some human being may be stupid enough to supply it with the key despite not asking for it.
I think being encrypted may not actually help much with the control problem, since the problem isn’t that we expect an AI to fully understand what we want and then be evil, it’s that we’re worried that an AI will not be optimizing what we want. Not knowing what the outputs actually do doesn’t seem like it would help at all (except that the AI would only have the inputs we want it to have).
That seems to me to be the advantage. If it can’t read the data in its sensors without the key, it can’t know if any given action will increase or decrease its utility.
This is about keeping the AI safe from being altered by bad actors before it becomes massively powerful. It is not an attempt at a Control Problem solution. It could still be useful.
“In this blogpost, we’re going to train a neural network that is fully encrypted during training (trained on unencrypted data). The result will be a neural network with two beneficial properties. First, the neural network’s intelligence is protected from those who might want to steal it, allowing valuable AIs to be trained in insecure environments without risking theft of their intelligence. Secondly, the network can only make encrypted predictions (which presumably have no impact on the outside world because the outside world cannot understand the predictions without a secret key). This creates a valuable power imbalance between a user and a superintelligence. If the AI is homomorphically encrypted, then from it’s perspective, the entire outside world is also homomorphically encrypted. A human controls the secret key and has the option to either unlock the AI itself (releasing it on the world) or just individual predictions the AI makes (seems safer).”
This post describes an interesting mashup of homomorphic encryption and neural networks. I think it is an neat idea and appreciate the effort to put together a demo. Perhaps there will be useful applications.
However, I think the suggestion that this could be an answer to the AI control problem is wrong. First, a superintelligent deep learning AI would not be a safe AI because we would not be able to reason about its utility function. If you are meaning that the same idea could be applied to a different kind of AI so that you would have an oracle AI for which a secret key was needed to read its outputs. I don’t think this helps. You have created a box for the oracle AI, however the problem remains that a superintelligence can probably escape from the box either by convincing you to let it out or by some less direct means that you can’t foresee.
In combination with an AI with social skills that are fundamentally stunted in some way, this might actually work. If the AI cannot directly interface with the world in any meaningful way without the key and it doesn’t have any power to persuade a human actor to supply it with the key, it’s pretty much trapped ( unless there is some way for it to break its own encryption ).
Edit: notwithstanding the possibility that some human being may be stupid enough to supply it with the key despite not asking for it.
I think being encrypted may not actually help much with the control problem, since the problem isn’t that we expect an AI to fully understand what we want and then be evil, it’s that we’re worried that an AI will not be optimizing what we want. Not knowing what the outputs actually do doesn’t seem like it would help at all (except that the AI would only have the inputs we want it to have).
the AI would only have the inputs we want it to
That seems to me to be the advantage. If it can’t read the data in its sensors without the key, it can’t know if any given action will increase or decrease its utility.
In Sowjet Russia the AI will encrypt YOU!
This is about keeping the AI safe from being altered by bad actors before it becomes massively powerful. It is not an attempt at a Control Problem solution. It could still be useful.
“In this blogpost, we’re going to train a neural network that is fully encrypted during training (trained on unencrypted data). The result will be a neural network with two beneficial properties. First, the neural network’s intelligence is protected from those who might want to steal it, allowing valuable AIs to be trained in insecure environments without risking theft of their intelligence. Secondly, the network can only make encrypted predictions (which presumably have no impact on the outside world because the outside world cannot understand the predictions without a secret key). This creates a valuable power imbalance between a user and a superintelligence. If the AI is homomorphically encrypted, then from it’s perspective, the entire outside world is also homomorphically encrypted. A human controls the secret key and has the option to either unlock the AI itself (releasing it on the world) or just individual predictions the AI makes (seems safer).”