Thank you! I didn’t see your first version of this, but your current version is helpful for the human-specific tests that they’re benchmarked on :)
Is there any information on how long the LLM spent on taking the tests? Any idea? I’d like to know the comparison with human times. (I realize it can depend on hardware, etc but would just like some general idea.)
Thank you! I didn’t see your first version of this, but your current version is helpful for the human-specific tests that they’re benchmarked on :)
Is there any information on how long the LLM spent on taking the tests? Any idea? I’d like to know the comparison with human times. (I realize it can depend on hardware, etc but would just like some general idea.)