I’m torn! I think that better LLM scaffolding accelerates capabilities as much as it accelerates alignment. On the other hand, a programmer (or a non-programmer with help from ChatGPT) could easily reproduce my current scaffolding code. Maybe open-sourcing the current state of the project is fine. What do you think?
I do think open sourcing is better, because there already was a lot of public attention and results on llm capabilities which are messy and misleading, and open sourcing one eval like this might improve our understanding a lot. Also, there are tons of llm agent projects/startups trying to build hype, so if you drop a benchmark here you are unlikely to attract unwanted attention (i’m guessing). I largely agree with https://www.lesswrong.com/posts/fRSj2W4Fjje8rQWm9/thoughts-on-sharing-information-about-language-model
EDIT: The agent I built for this replication is now publicly available as part of the METR task workbench: https://drive.google.com/drive/folders/1-m1y0_Akunqq5AWcFoEH2_-BeKwsodPf
I’m torn! I think that better LLM scaffolding accelerates capabilities as much as it accelerates alignment. On the other hand, a programmer (or a non-programmer with help from ChatGPT) could easily reproduce my current scaffolding code. Maybe open-sourcing the current state of the project is fine. What do you think?
I do think open sourcing is better, because there already was a lot of public attention and results on llm capabilities which are messy and misleading, and open sourcing one eval like this might improve our understanding a lot. Also, there are tons of llm agent projects/startups trying to build hype, so if you drop a benchmark here you are unlikely to attract unwanted attention (i’m guessing). I largely agree with https://www.lesswrong.com/posts/fRSj2W4Fjje8rQWm9/thoughts-on-sharing-information-about-language-model
If it is twice as easy, that halves the positives of open-sourcing and the negatives, it doesn’t change the direction.
Beware the Unilateralist’s Curse.
I believe you should err on the side of not releasing it.
At the very least, would you be happy to share the code with alignment researchers interested in using it for our experiments?
I neglected to update my comment here—the agent I built for this replication is now publicly available as part of the METR task workbench, here: https://drive.google.com/drive/folders/1-m1y0_Akunqq5AWcFoEH2_-BeKwsodPf
Which is not good enough. We need alignment to accelerate faster than capabilities in order to catch up.
I think open-sourcing the current state of the project would be very useful to researchers.