Nathan Helm-Burger comments on Testbed evals: evaluating AI safety even when it can’t be directly measured

Nathan Helm-Burger 16 Nov 2023 21:38 UTC
4 points
2
Love this. I’ve been thinking about related things in AI bio safety evals. Could we have an LLM walk a layperson through a complicated-but-safe wetlab protocol which is an approximate difficulty match for a dangerous protocol? How good of evidence would this be compared to doing the actual dangerous protocol? Maybe at least you could cut eval costs by having a large subject group do the safe protocol, and only a small carefully screened and supervised group go through the dangerous protocol.