DanielFilan comments on DanielFilan’s Shortform Feed

DanielFilan 14 Nov 2024 17:40 UTC
LW: 4 AF: 3
0
AF
When I wrote that, I wasn’t thinking so much about evals / model organisms as stuff like:
- putting a bunch of agents in a simulated world and seeing how they interact
- weak-to-strong / easy-to-hard generalization
basically stuff along the lines of “when you put agents in X situation, they tend to do Y thing”, rather than trying to understand latent causes / capabilities