I think these tasks would be in scope as long as these tasks can be done fully digitally and can be done relatively easily in text. (And it’s possible to setup the task in a docker container etc etc.)
Quoting the desiderata from the post:
Plays to strengths of LLM agents: ideally, most of the tasks can be completed by writing code, using the command line, or other text-based interaction.
I think these tasks would be in scope as long as these tasks can be done fully digitally and can be done relatively easily in text. (And it’s possible to setup the task in a docker container etc etc.)
Quoting the desiderata from the post:
Yep, that’s right. And also need it to be possible to check the solutions without needing to run physical experiments etc!