Thomas Larsen comments on DeepMind: Model evaluation for extreme risks

Thomas Larsen 16 Jun 2023 22:11 UTC
21 points
19
The first line of defence is to avoid training models that have sufficient dangerous capabilities and misalignment to pose extreme risk. Sufficiently concerning evaluation results should warrant delaying a scheduled training run or pausing an existing one
It’s very disappointing to me that this sentence doesn’t say “cancel”. As far as I understand, most people on this paper agree that we do not have alignment techniques to align superintelligence. Therefor, if the model evaluations predict an AI that is sufficiently smarter than humans, the training run should be cancelled.
- Zach Stein-Perlman 16 Jun 2023 22:32 UTC
  4 points
  2
  Parent
  Sure. Fwiw I read “delay” and “pause” as stop until it’s safe, not stop for a while and resume while the eval result is still concerning, but I agree being explicit would be nice.
  - Thomas Larsen 16 Jun 2023 23:24 UTC
    4 points
    2
    Parent
    Yeah, this is fair, and later in the section they say:
    Careful scaling. If the developer is not confident it can train a safe model at the scale it initially had planned, they could instead train a smaller or otherwise weaker model.
    Which is good, supports your interpretation, and gets close to the thing I want, albeit less explicitly than I would have liked.
    
    I still think the “delay/pause” wording pretty strongly implies that the default is to wait for a short amount of time, and then keep going at the intended capability level. I think there’s some sort of implicit picture that the eval result will become unconcerning in a matter of weeks-months, which I just don’t see the mechanism for short of actually good alignment progress.