@ryan_greenblatt’s approach also asks GPT-4o to improve its previous guesses.
These calls are expensive though.
The idea of Program Dithering is to generate many candidate programs cheaply.
Agree overall, but you might be able to use a notably cheaper model (e.g. GPT-3.5) to dither.
If GPT-4o made the off-by-one error, is it reasonable to expect GPT-3.5 to spot it?
No, but it doesn’t need to spot errors, just note places which could plausibly be bugs.
@ryan_greenblatt’s approach also asks GPT-4o to improve its previous guesses.
These calls are expensive though.
The idea of Program Dithering is to generate many candidate programs cheaply.
Agree overall, but you might be able to use a notably cheaper model (e.g. GPT-3.5) to dither.
If GPT-4o made the off-by-one error, is it reasonable to expect GPT-3.5 to spot it?
No, but it doesn’t need to spot errors, just note places which could plausibly be bugs.