When I wrote that, I wasn’t thinking so much about evals / model organisms as stuff like:
putting a bunch of agents in a simulated world and seeing how they interact
weak-to-strong / easy-to-hard generalization
basically stuff along the lines of “when you put agents in X situation, they tend to do Y thing”, rather than trying to understand latent causes / capabilities
When I wrote that, I wasn’t thinking so much about evals / model organisms as stuff like:
putting a bunch of agents in a simulated world and seeing how they interact
weak-to-strong / easy-to-hard generalization
basically stuff along the lines of “when you put agents in X situation, they tend to do Y thing”, rather than trying to understand latent causes / capabilities