I also expect that if we did develop some neat new elicitation technique we thought would trigger yellow-line evals, we’d re-run them ahead of schedule.
[...]
I also think people might be reading much more confidence into the 30% than is warranted; my contribution to this process included substantial uncertainty about what yellow-lines we’d develop for the next round
Thanks for these clarifications. I didn’t realize that the 30% was for the new yellow-line evals rather than the current ones.
Since triggering a yellow-line eval requires pausing until we have either safety and security mitigations or design a better yellow-line eval with a higher ceiling, doing so only risks the costs of pausing when we could have instead prepared mitigations or better evals
I’m having trouble parsing this sentence. What you mean by “doing so only risks the costs of pausing when we could have instead prepared mitigations or better evals”? Doesn’t pausing include focusing on mitigations and evals?
Thanks for these clarifications. I didn’t realize that the 30% was for the new yellow-line evals rather than the current ones.
That’s how I was thinking about the predictions that I was making; others might have been thinking about the current evals where those were more stable.
I’m having trouble parsing this sentence. What you mean by “doing so only risks the costs of pausing when we could have instead prepared mitigations or better evals”? Doesn’t pausing include focusing on mitigations and evals?
Of course, but pausing also means we’d have to shuffle people around, interrupt other projects, and deal with a lot of other disruption (the costs of pausing). Ideally, we’d continue updating our yellow-line evals to stay ahead of model capabilities until mitigations are ready.
Thanks for the response!
Thanks for these clarifications. I didn’t realize that the 30% was for the new yellow-line evals rather than the current ones.
I’m having trouble parsing this sentence. What you mean by “doing so only risks the costs of pausing when we could have instead prepared mitigations or better evals”? Doesn’t pausing include focusing on mitigations and evals?
That’s how I was thinking about the predictions that I was making; others might have been thinking about the current evals where those were more stable.
Of course, but pausing also means we’d have to shuffle people around, interrupt other projects, and deal with a lot of other disruption (the costs of pausing). Ideally, we’d continue updating our yellow-line evals to stay ahead of model capabilities until mitigations are ready.