Anthropic scaring laws
Personally, I think “Discovering Language Model Behaviors with Model-Written Evaluations” is most valuable because of what it demonstrates from a scientific perspective, namely that RLHF and scale make certain forms of agentic behavior worse.
Personally, I think “Discovering Language Model Behaviors with Model-Written Evaluations” is most valuable because of what it demonstrates from a scientific perspective, namely that RLHF and scale make certain forms of agentic behavior worse.