I have an answer to that: making sure that NIST:AISI had at least scores of automated evals for checkpoints of any new large training runs, as well as pre-deployment eval access.
Seems like a pretty low-cost, high-value ask to me. Even if that info leaked from AISI, it wouldn’t give away corporate algorithmic secrets.
A higher cost ask, but still fairly reasonable, is pre-deployment evals which require fine-tuning. You can’t have a good sense of a what the model would be capable of in the hands of bad actors if you don’t test fine-tuning it on hazardous info.
If you could only have “partial visibility”, what are some of the things you would most want the government to be able to know?
I have an answer to that: making sure that NIST:AISI had at least scores of automated evals for checkpoints of any new large training runs, as well as pre-deployment eval access.
Seems like a pretty low-cost, high-value ask to me. Even if that info leaked from AISI, it wouldn’t give away corporate algorithmic secrets.
A higher cost ask, but still fairly reasonable, is pre-deployment evals which require fine-tuning. You can’t have a good sense of a what the model would be capable of in the hands of bad actors if you don’t test fine-tuning it on hazardous info.