William_S comments on Can corrigibility be learned safely?

William_S 24 Apr 2018 18:18 UTC
1 point
Ah, right. I guess I was balking at moving from exorbitant to exp(exorbitant). Maybe it’s better to think of this as reducing the size of fully worked initial overseer example problems that can be produced for training/increasing the number of amplification rounds that are needed.
So my argument is more an example of what a distilled overseer could learn as an efficient approximation.