davidad comments on RSPs are pauses done right

davidad 14 Oct 2023 18:32 UTC
LW: 28 AF: 11
12
AF
I think AI Safety Levels are a good idea, but evals-based classification needs to be complemented by compute thresholds to mitigate the risks of loss of control via deceptive alignment. Here is a non-nebulous proposal.