I’m confused—unless I’m misunderstanding something, the challenges are presented as JSON data. Where does vision come in? Or are you using vision as a metaphor for the model attempting to create an internal representation of the grid from the JSON data?
[UPDATE—never mind, I had forgotten that the method involved ‘Provide the ARC-AGI problem to GPT-4o, with both an image representation and with various text representations for each grid in the problem.’]
I’m confused—unless I’m misunderstanding something, the challenges are presentedas JSON data. Where does vision come in? Or are you using vision as a metaphor for the model attempting to create an internal representation of the grid from the JSON data?[UPDATE—never mind, I had forgotten that the method involved ‘Provide the ARC-AGI problem to GPT-4o, with both an image representation and with various text representations for each grid in the problem.’]