Bogdan Ionut Cirstea comments on Secret Collusion: Will We Know When to Unplug AI?

Bogdan Ionut Cirstea 16 Sep 2024 19:50 UTC
5 points
0
A potential alternative to this may be the use of machine unlearning so as to selectively unlearn data or capabilities.
Any thoughts on how useful existing benchmarks like WMDP-cyber would be; or, alternately, on how difficult it would be to develop a similar benchmark, but more tailored vs. secret collusion?
- schroederdewitt 17 Sep 2024 9:11 UTC
  2 points
  0
  Parent
  That’s a great question. I am not quite sure but WMDP-cyber does look relevant. If you are interested in working on a new benchmark for unlearning and secret collusion, do reach out to us!