The basic innovation is that AI boxing can be done cheaply, and with no performance penalty. So sandboxing can be done surprisingly easily, and AI companies can prevent AI from knowing dangerous facts as part of their business. This is huge, since it allows to much more safely iterate on the alignment problem by always sandboxing AI.
Similarly, it benefits narrow AI approaches as well, as we can avoid exposing too much dangerous information to AI.
As a concrete example, Jacob’s alignment plan requires sandboxing in order to work, and I showed that such sandboxing is easy to do, helping Jacob’s alignment plan.
Finally, AI alignment is in a much less precarious position if AI simply can’t break out of the box, since it can leverage the power of iteration used in science to solve hard problems. So long as AI is aligned in the sandbox, than the consequences are much, much more safe than one free on the internet.
Another plan that is helped by sandboxes is OpenAI’s alignment plan. Another link can be shown for OpenAI’s alignment plan:
It’s a non-trivial way to sandbox it, but I am not sure what value that gives us which the trivial way of sandboxing doesn’t give us.
The basic innovation is that AI boxing can be done cheaply, and with no performance penalty. So sandboxing can be done surprisingly easily, and AI companies can prevent AI from knowing dangerous facts as part of their business. This is huge, since it allows to much more safely iterate on the alignment problem by always sandboxing AI.
Similarly, it benefits narrow AI approaches as well, as we can avoid exposing too much dangerous information to AI.
As a concrete example, Jacob’s alignment plan requires sandboxing in order to work, and I showed that such sandboxing is easy to do, helping Jacob’s alignment plan.
Here’s the links:
https://www.lesswrong.com/posts/WKGZBCYAbZ6WGsKHc/love-in-a-simbox-is-all-you-need
https://www.lesswrong.com/posts/JPHeENwRyXn9YFmXc/empowerment-is-almost-all-we-need
Finally, AI alignment is in a much less precarious position if AI simply can’t break out of the box, since it can leverage the power of iteration used in science to solve hard problems. So long as AI is aligned in the sandbox, than the consequences are much, much more safe than one free on the internet.
Another plan that is helped by sandboxes is OpenAI’s alignment plan. Another link can be shown for OpenAI’s alignment plan:
https://openai.com/blog/our-approach-to-alignment-research/