Joe Collman comments on Debate AI and the Decision to Release an AI

Joe Collman 13 May 2020 4:28 UTC
3 points
Debate between two with equivalent strength to A seems useful, certainly.

For the variants, I’m not proposing they ever get run. You’d have D create {A’, A″, A‴...}, and have B debate C over whether each versions of A were safe, without running them (the thinking being that inserting a subtle flaw is easier than finding a subtle flaw). I suppose you’d still have the potential for exploits through the variants’ data though.… I do actually think D’s variants would pose a greater risk than A here. Any security exploits would exist for both, but D would be optimising harder than A’s designers to find them (that’s assuming you’re not specifically training A to pass the B-vs-C debate, which would seem to be a bad idea).
- Chris_Leong 13 May 2020 5:18 UTC
  2 points
  Parent
  “For the variants, I’m not proposing they ever get run”—that makes sense