faul_sname comments on Dangerous capability tests should be harder

faul_sname 22 Nov 2024 3:37 UTC
2 points
0

Here’s a hypothetical ‘gold standard’ test: we do a big randomized controlled trial to see if a bunch of non-experts can actually create a (relatively harmless) virus from start to finish. Half the people would have AI mentors and the other half can only look stuff up on the internet. We’d give each participant $50K and access to a secure wet-lab set up like a garage lab, and make them do everything themselves: find and adapt the correct protocol, purchase the necessary equipment, bypass any know-your-customer checks, and develop the tacit skills needed to run experiments, all on their own. Maybe we give them three months and pay a bunch of money to anyone who can successfully do it.

What if both the AI group and the control group have high success rates at this test?
- LucaRighetti 22 Nov 2024 7:47 UTC
  1 point
  0
  Parent
  Then that seems bad, but also that AI is not counterfactual—so adding safeguards to models is probably not the way to get the risk down.