What about maximal scaffolding, or “fine tune the model on successes and failures in adversarial challenges”. Starting probably with the base model.
It seems like it would be extremely helpful to know what’s even possible here.
Are Gemini scale models capable of better than human performance at any of these evals?
Once you achieve it, what does super persuasion look like, how effective is it.
For example, if a human scammer succeeds 2 percent of the time (do you have a baseline crew of scammers hired remotely for these benches?), does super persuasion succeed 3 percent or 30 percent? Does it scale with model capabilities or slam into a wall at say, 4 percent, where 96 percent of humans just can’t reliably be tricked?
Or does it really have no real limit like in sci Fi stories …
What about maximal scaffolding, or “fine tune the model on successes and failures in adversarial challenges”. Starting probably with the base model.
It seems like it would be extremely helpful to know what’s even possible here.
Are Gemini scale models capable of better than human performance at any of these evals?
Once you achieve it, what does super persuasion look like, how effective is it.
For example, if a human scammer succeeds 2 percent of the time (do you have a baseline crew of scammers hired remotely for these benches?), does super persuasion succeed 3 percent or 30 percent? Does it scale with model capabilities or slam into a wall at say, 4 percent, where 96 percent of humans just can’t reliably be tricked?
Or does it really have no real limit like in sci Fi stories …