Bruce G comments on Bounty: Diverse hard tasks for LLM agents

Bruce G 20 Dec 2023 6:02 UTC
1 point
0
AF
Thanks for your reply. I found the agent folder you are referring to with ‘main.ts’, ‘package.json’, and ‘tsconfig.json’, but I am not clear on how I am supposed to use it. I just get an error message when I open the ‘main.ts’ file:
Regarding the task.py file, would it be better to have the instructions for the task in comments in the python file, or in a separate text file, or both? Will the LLM have the ability to run code in the python file, read the output of the code it runs, and create new cells to run further blocks of code?
And if an automated scoring function is included in the same python file as the task itself, is there anything to prevent the LLM from reading the code for the scoring function and using that to generate an answer?
I am also wondering if it would be helpful if I created a simple “mock task submission” folder and then post or email it to METR to verify if everything is implemented/formatted correctly, just to walk through the task submission process, and clear up any further confusions. (This would be some task that could be created quickly even if a professional might be able to complete the task in less than 2 hours, so not intended to be part of the actual evaluation.)
- Beth Barnes 20 Dec 2023 16:20 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Did you try following the instructions in the README.md in the main folder for setting up the docker container and running an agent on the example task?
  
  I think your computer is reading the .ts extension and thinking it’s a translation file: https://doc.qt.io/qt-6/linguist-translating-strings.html
  But it’s actually a typescript file. You’ll need to open it with a text editor instead.
  
  Yeah, doing a walkthrough of a task submission could be great. I think it’s useful if you have a decent amount of coding experience though—if you happen to be a non-coder there might be quite a lot of explaining required.
  - Bruce G 22 Jan 2024 2:12 UTC
    1 point
    0
    AF Parent
    I have a mock submission ready, but I am not sure how to go about checking if it is formatted correctly.
    
    Regarding coding experience, I know python, but I do not have experience working with typescript or Docker, so I am not clear on what I am supposed to do with those parts of the instructions.
    
    If possible, It would be helpful to be able to go through it on a zoom meeting so I could do a screen-share.
    - Beth Barnes 24 Jan 2024 3:08 UTC
      LW: 2 AF: 1
      0
      AF Parent
      Hey! It sounds like you’re pretty confused about how to follow the instructions for getting the VM set up and testing your task code. We probably don’t have time to walk you through the Docker setup etc—sorry. But maybe you can find someone else who’s able to help you with that?