aogara comments on Benchmarking LLM Agents on Kaggle Competitions

aogara 29 May 2024 23:00 UTC
2 points
0
Very cool, thanks! This paper focuses on building a DS Agent, but I’d be interested to see a version of this paper that focuses on building a benchmark. It could evaluate several existing agent architectures, benchmark them against human performance, and leave significant room for improvement by future models.