Nathan Helm-Burger comments on How might we align transformative AI if it’s developed very soon?

Nathan Helm-Burger 29 Aug 2022 21:18 UTC
1 point
0
I do think it’s pretty plausible we could develop techniques which grant a very smooth impairment/capability curve.
Sneak peek of my recent work on this in GPT-2-xl:
[the labels correspond to topics of the datasets tested. I’d hoped for more topic-specificity in the impairment, but it is nice and smooth at least. I’m still hopeful that combining my work with Garret’s work could greatly improve the topic-specificity of the impairment.]
Regarding the safety of training and evaluation:
I think that training can be made more safe than testing, which in turn can be made MUCH more safe than deployment. Therefore, I’m not too worried about phase changes within training, so long as the training is done with reasonable precautions.