Zhehui Huang comments on Benchmarking LLM Agents on Kaggle Competitions

Zhehui Huang 29 May 2024 20:49 UTC
3 points
0
Maybe check this paper: DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning, https://arxiv.org/abs/2402.17453
- aog 29 May 2024 23:00 UTC
  2 points
  0
  Parent
  Very cool, thanks! This paper focuses on building a DS Agent, but I’d be interested to see a version of this paper that focuses on building a benchmark. It could evaluate several existing agent architectures, benchmark them against human performance, and leave significant room for improvement by future models.