Gerald Monroe comments on Half-baked AI Safety ideas thread

Gerald Monroe 25 Jun 2022 6:39 UTC
5 points
Sparseness. I thought of this idea from noticing how some of my coworkers behave.

If all the AI has cognitive resources for, through ruthless winnowing down of model size and system complexity in training, is to perform well on it’s task—it’s the smallest and simplest system that performs acceptably well. (we don’t make it 10 times bigger for 0.1% more performance)
These kinds of systems don’t have the space for deception, planning extremely complex behaviors that have very long term rewards, etc. All their weights or cognitive systems (future AIs will probably not be 1-3 neural networks but probably many separate systems, some of which don’t use NNs at all to function) are solely for the things that give it more score in the training environment.
It doesn’t have any systems that act to grant it deception so long as the training environment didn’t provide any significant reward for deception.