Removing non-deterministic APIs, as well as making it fully deterministic is possible with very little or zero performance penalty, thus sandboxing the AI.
The bits that remain accessible to the AI aren’t sufficient to break the sandbox, because they’re totally random, and the No Free Lunch theorem prohibits learning on entirely random data.
Thus, sandboxing arbitrarily powerful AI is possible, or equivalently there is a positive answer to whether we can prevent the AI from breaking the box, no matter how powerful.
That’s a non-trivial claim, since the AI is still running here.
The basic innovation is that AI boxing can be done cheaply, and with no performance penalty. So sandboxing can be done surprisingly easily, and AI companies can prevent AI from knowing dangerous facts as part of their business. This is huge, since it allows to much more safely iterate on the alignment problem by always sandboxing AI.
Similarly, it benefits narrow AI approaches as well, as we can avoid exposing too much dangerous information to AI.
As a concrete example, Jacob’s alignment plan requires sandboxing in order to work, and I showed that such sandboxing is easy to do, helping Jacob’s alignment plan.
Finally, AI alignment is in a much less precarious position if AI simply can’t break out of the box, since it can leverage the power of iteration used in science to solve hard problems. So long as AI is aligned in the sandbox, than the consequences are much, much more safe than one free on the internet.
Another plan that is helped by sandboxes is OpenAI’s alignment plan. Another link can be shown for OpenAI’s alignment plan:
Sandboxing an AI is trivial; you can just not run it.
My claim is roughly the following:
Removing non-deterministic APIs, as well as making it fully deterministic is possible with very little or zero performance penalty, thus sandboxing the AI.
The bits that remain accessible to the AI aren’t sufficient to break the sandbox, because they’re totally random, and the No Free Lunch theorem prohibits learning on entirely random data.
Thus, sandboxing arbitrarily powerful AI is possible, or equivalently there is a positive answer to whether we can prevent the AI from breaking the box, no matter how powerful.
That’s a non-trivial claim, since the AI is still running here.
It’s a non-trivial way to sandbox it, but I am not sure what value that gives us which the trivial way of sandboxing doesn’t give us.
The basic innovation is that AI boxing can be done cheaply, and with no performance penalty. So sandboxing can be done surprisingly easily, and AI companies can prevent AI from knowing dangerous facts as part of their business. This is huge, since it allows to much more safely iterate on the alignment problem by always sandboxing AI.
Similarly, it benefits narrow AI approaches as well, as we can avoid exposing too much dangerous information to AI.
As a concrete example, Jacob’s alignment plan requires sandboxing in order to work, and I showed that such sandboxing is easy to do, helping Jacob’s alignment plan.
Here’s the links:
https://www.lesswrong.com/posts/WKGZBCYAbZ6WGsKHc/love-in-a-simbox-is-all-you-need
https://www.lesswrong.com/posts/JPHeENwRyXn9YFmXc/empowerment-is-almost-all-we-need
Finally, AI alignment is in a much less precarious position if AI simply can’t break out of the box, since it can leverage the power of iteration used in science to solve hard problems. So long as AI is aligned in the sandbox, than the consequences are much, much more safe than one free on the internet.
Another plan that is helped by sandboxes is OpenAI’s alignment plan. Another link can be shown for OpenAI’s alignment plan:
https://openai.com/blog/our-approach-to-alignment-research/