Potential dangers of future evaluations / gain-of-function research, which I’m sure you and Beth are already extremely well aware of:
Falsely evaluating a model as safe (obviously)
Choosing evaluation metrics which don’t give us enough time to react (After evaluation metrics switch would from “safe” to “not safe”, we should like to have enough time to recognize this and do something about it before we’re all dead)
Crying wolf too many times, making it more likely that no one will believe you when a danger threshold has really been crossed
Letting your methods for making future AIs scarier be too strong given the probability they will be leaked or otherwise made widely accessible. (If the methods / tools are difficult to replicate without resources)
Letting your methods for making AIs scarier be too weak, lest it’s too easy for some bad actors to go much further than you did
Failing to have a precommitment to stop this research when models are getting scary enough that it’s on balance best to stop making them scarier, even if no-one else believes you yet
Potential dangers of future evaluations / gain-of-function research, which I’m sure you and Beth are already extremely well aware of:
Falsely evaluating a model as safe (obviously)
Choosing evaluation metrics which don’t give us enough time to react (After evaluation metrics switch would from “safe” to “not safe”, we should like to have enough time to recognize this and do something about it before we’re all dead)
Crying wolf too many times, making it more likely that no one will believe you when a danger threshold has really been crossed
Letting your methods for making future AIs scarier be too strong given the probability they will be leaked or otherwise made widely accessible. (If the methods / tools are difficult to replicate without resources)
Letting your methods for making AIs scarier be too weak, lest it’s too easy for some bad actors to go much further than you did
Failing to have a precommitment to stop this research when models are getting scary enough that it’s on balance best to stop making them scarier, even if no-one else believes you yet