I have an answer to that: making sure that NIST:AISI had at least scores of automated evals for checkpoints of any new large training runs, as well as pre-deployment eval access.
Seems like a pretty low-cost, high-value ask to me. Even if that info leaked from AISI, it wouldn’t give away corporate algorithmic secrets.
A higher cost ask, but still fairly reasonable, is pre-deployment evals which require fine-tuning. You can’t have a good sense of a what the model would be capable of in the hands of bad actors if you don’t test fine-tuning it on hazardous info.
I have an answer to that: making sure that NIST:AISI had at least scores of automated evals for checkpoints of any new large training runs, as well as pre-deployment eval access.
Seems like a pretty low-cost, high-value ask to me. Even if that info leaked from AISI, it wouldn’t give away corporate algorithmic secrets.
A higher cost ask, but still fairly reasonable, is pre-deployment evals which require fine-tuning. You can’t have a good sense of a what the model would be capable of in the hands of bad actors if you don’t test fine-tuning it on hazardous info.