I think evals are fantastic (ie obviously a good and correct thing to do; dramatically better than doing nothing) but there is a little bit of awkwardness in terms of deciding how hard to try. You don’t really want to spend a well-funded-startup’s worth of effort to trigger dangerous capabilities (and potentially cause your own destruction), but you know eventually that someone will. I don’t know how to resolve this.
I think evals are fantastic (ie obviously a good and correct thing to do; dramatically better than doing nothing) but there is a little bit of awkwardness in terms of deciding how hard to try. You don’t really want to spend a well-funded-startup’s worth of effort to trigger dangerous capabilities (and potentially cause your own destruction), but you know eventually that someone will. I don’t know how to resolve this.