They did run the tests for all models, from Table 1:
(the columns are GPT-4, GPT-4 (no vision), GPT-3.5)
They did run the tests for all models, from Table 1:
(the columns are GPT-4, GPT-4 (no vision), GPT-3.5)