Debate between two with equivalent strength to A seems useful, certainly.
For the variants, I’m not proposing they ever get run. You’d have D create {A’, A″, A‴...}, and have B debate C over whether each versions of A were safe, without running them (the thinking being that inserting a subtle flaw is easier than finding a subtle flaw). I suppose you’d still have the potential for exploits through the variants’ data though.… I do actually think D’s variants would pose a greater risk than A here. Any security exploits would exist for both, but D would be optimising harder than A’s designers to find them (that’s assuming you’re not specifically training A to pass the B-vs-C debate, which would seem to be a bad idea).
Debate between two with equivalent strength to A seems useful, certainly.
For the variants, I’m not proposing they ever get run. You’d have D create {A’, A″, A‴...}, and have B debate C over whether each versions of A were safe, without running them (the thinking being that inserting a subtle flaw is easier than finding a subtle flaw). I suppose you’d still have the potential for exploits through the variants’ data though.… I do actually think D’s variants would pose a greater risk than A here. Any security exploits would exist for both, but D would be optimising harder than A’s designers to find them (that’s assuming you’re not specifically training A to pass the B-vs-C debate, which would seem to be a bad idea).
“For the variants, I’m not proposing they ever get run”—that makes sense