Jordan Taylor comments on GPT-4

Jordan Taylor 15 Mar 2023 2:25 UTC
13 points
7
Potential dangers of future evaluations / gain-of-function research, which I’m sure you and Beth are already extremely well aware of:
1. Falsely evaluating a model as safe (obviously)
2. Choosing evaluation metrics which don’t give us enough time to react (After evaluation metrics switch would from “safe” to “not safe”, we should like to have enough time to recognize this and do something about it before we’re all dead)
3. Crying wolf too many times, making it more likely that no one will believe you when a danger threshold has really been crossed
4. Letting your methods for making future AIs scarier be too strong given the probability they will be leaked or otherwise made widely accessible. (If the methods / tools are difficult to replicate without resources)
5. Letting your methods for making AIs scarier be too weak, lest it’s too easy for some bad actors to go much further than you did
6. Failing to have a precommitment to stop this research when models are getting scary enough that it’s on balance best to stop making them scarier, even if no-one else believes you yet
What links here?
- Jordan Taylor's comment on Announcing Apollo Research by Marius Hobbhahn (31 May 2023 13:59 UTC; 3 points)