[...] instead I started working to get evals built, especially for situational awareness
I’m curious what happened to the evals you mention here. Did any end up being built? Did they cover, or plan to cover, any ground that isn’t covered by the SAD benchmark?
I’m curious what happened to the evals you mention here. Did any end up being built? Did they cover, or plan to cover, any ground that isn’t covered by the SAD benchmark?
Some of them ended up being built, or influencing things that got built (e.g. the SAD benchmark & other papers produced by Owain et al), others are still yet to be built. Here’s a list I made while at OpenAI, which I got permission to share: https://docs.google.com/document/d/1pDPvnt6iq3BvP4EkNjchdvhbRtIFpD8ND99vwO3H-XI/edit?usp=sharing