Daniel Tan comments on Which evals resources would be good?

Daniel Tan 17 Nov 2024 18:39 UTC
3 points
2
As someone with very little working knowledge of evals, I think the following open-source resources would be useful for pedagogy
- A brief overview of the field covering central concepts, goals, challenges
- A list of starter projects for building skills / intuition
- A list of more advanced projects that address timely / relevant research needs
Maybe similar in style to https://www.neelnanda.io/mechanistic-interpretability/quickstart
It’s also hard to understate the importance of tooling that is:
- Streamlined: i.e. handles most relevant concerns by default, in a reasonable way, such that new users won’t trip on them (e.g. for evals tooling, it would be good to have simple and reasonably effective elicitation strategies available off-the-shelf)
- Well-documented: both at an API level, and with succinct end-to-end examples of doing important things
I suspect TransformerLens + associated Colab walkthroughs has had a huge impact in popularising mechanistic interpretability.