I’d like to see open-sourced evaluation and safety tools. Seems like a good thing to push on.
I’m in favour of open-source interpretability tools, but I’m not in favour of open-sourcing evaluation tools as I think they would be used for capabilities without substantially improving safety.
I’d like to see open-sourced evaluation and safety tools. Seems like a good thing to push on.
I’m in favour of open-source interpretability tools, but I’m not in favour of open-sourcing evaluation tools as I think they would be used for capabilities without substantially improving safety.