I think being encrypted may not actually help much with the control problem, since the problem isn’t that we expect an AI to fully understand what we want and then be evil, it’s that we’re worried that an AI will not be optimizing what we want. Not knowing what the outputs actually do doesn’t seem like it would help at all (except that the AI would only have the inputs we want it to have).
That seems to me to be the advantage. If it can’t read the data in its sensors without the key, it can’t know if any given action will increase or decrease its utility.
I think being encrypted may not actually help much with the control problem, since the problem isn’t that we expect an AI to fully understand what we want and then be evil, it’s that we’re worried that an AI will not be optimizing what we want. Not knowing what the outputs actually do doesn’t seem like it would help at all (except that the AI would only have the inputs we want it to have).
the AI would only have the inputs we want it to
That seems to me to be the advantage. If it can’t read the data in its sensors without the key, it can’t know if any given action will increase or decrease its utility.