jimrandomh answers Is keeping AI “in the box” during training enough?

jimrandomh 8 Jul 2021 5:22 UTC
2 points
GPT-3 is not dangerous to train in a box (and in fact is normally trained without I/O faculties, in a way that makes boxing not particularly meaningful), because its training procedure doesn’t involve any I/O complicated enough to introduce infosec concerns, and because it isn’t smart enough to do anything particularly clever with its environment. It only predicts text, and there is no possible text that will do anything weird without someone reading it.
With things-much-more-powerful-than-GPT-3, infosec of the box becomes an issue. For example, you could be training on videogames, and some videogames have buffer overflows that are controllable via the gamepad. You’ll also tend to wind up with log file data that’s likely to get viewed by humans, who could be convinced to add I/O to let it access the internet.
At really high levels of intelligence and amounts computation—far above GPT-3 or anything likely to be feasible today—you can run into “thoughtcrime”, where an AI’s thoughts contain morally significant agents.
I wrote a paper on the infosec aspects of this, describing how you would set things up if you thought you had a genuine chance of producing superintelligence and needed to run some tests before you let it do things: https://arxiv.org/abs/1604.00545