Seems like a reasonable idea. To implement this, I’d have to look more carefully at exactly what types of mistakes GPT-4o makes to calibrate what should/shouldn’t be dithered. (Additional programs are cheap, but you can easily get a combinatorial explosion with this sort of thing.)
(I’m not currently working on ARC-AGI methods and I might not ever return to this, so don’t count on me trying this!)
Can I ask why you “might not ever return to this”? I’ve just recently discovered the ARC challenge and just went through Chollet’s “On the Measure of Intelligence” and I’m tempted to go deeper into the rabbit hole. Just curious to know if your motivations for not returning are strictly personal or if you think this is a wild goose chase.
If you have N locations that you want to perturb, then if you try a single off-by-one perturbation at a time, this adds 2N programs. With two at a time, this adds N(2N−1) programs.
There’s a possible optimization, where you only try this on tasks where no unperturbed program was found (<28%)
EDIT: Ironically, I made an off-by-one error, which Program Dithering would have fixed: This should be N(2N−2)=2N(N−1)
Seems like a reasonable idea. To implement this, I’d have to look more carefully at exactly what types of mistakes GPT-4o makes to calibrate what should/shouldn’t be dithered. (Additional programs are cheap, but you can easily get a combinatorial explosion with this sort of thing.)
(I’m not currently working on ARC-AGI methods and I might not ever return to this, so don’t count on me trying this!)
Can I ask why you “might not ever return to this”? I’ve just recently discovered the ARC challenge and just went through Chollet’s “On the Measure of Intelligence” and I’m tempted to go deeper into the rabbit hole. Just curious to know if your motivations for not returning are strictly personal or if you think this is a wild goose chase.
If you have N locations that you want to perturb, then if you try a single off-by-one perturbation at a time, this adds 2N programs. With two at a time, this adds N(2N−1) programs.
There’s a possible optimization, where you only try this on tasks where no unperturbed program was found (<28%)
EDIT: Ironically, I made an off-by-one error, which Program Dithering would have fixed: This should be N(2N−2)=2N(N−1)