I don’t have strong opinions on an A vs. B debate or a B vs. C debate. That was a detail I wasn’t paying much attention to. I was just proposing using two AI’s with equivalent strengtht to A. One worry I have about making D create variants with known flaws would be if any of these exploited security holes, although maybe a normal AGI, being fully general, would be able to exploit security holes anyway.
Debate between two with equivalent strength to A seems useful, certainly.
For the variants, I’m not proposing they ever get run. You’d have D create {A’, A″, A‴...}, and have B debate C over whether each versions of A were safe, without running them (the thinking being that inserting a subtle flaw is easier than finding a subtle flaw). I suppose you’d still have the potential for exploits through the variants’ data though.… I do actually think D’s variants would pose a greater risk than A here. Any security exploits would exist for both, but D would be optimising harder than A’s designers to find them (that’s assuming you’re not specifically training A to pass the B-vs-C debate, which would seem to be a bad idea).
I don’t have strong opinions on an A vs. B debate or a B vs. C debate. That was a detail I wasn’t paying much attention to. I was just proposing using two AI’s with equivalent strengtht to A. One worry I have about making D create variants with known flaws would be if any of these exploited security holes, although maybe a normal AGI, being fully general, would be able to exploit security holes anyway.
Debate between two with equivalent strength to A seems useful, certainly.
For the variants, I’m not proposing they ever get run. You’d have D create {A’, A″, A‴...}, and have B debate C over whether each versions of A were safe, without running them (the thinking being that inserting a subtle flaw is easier than finding a subtle flaw). I suppose you’d still have the potential for exploits through the variants’ data though.… I do actually think D’s variants would pose a greater risk than A here. Any security exploits would exist for both, but D would be optimising harder than A’s designers to find them (that’s assuming you’re not specifically training A to pass the B-vs-C debate, which would seem to be a bad idea).
“For the variants, I’m not proposing they ever get run”—that makes sense