Fabien Roger comments on When can we trust model evaluations?

Fabien Roger 6 Dec 2024 7:22 UTC
LW: 4 AF: 4
2
AF
This post is a great explainer of why prompt-based elicitation is insufficient, why iid-training-based elicitation can be powerful, and why RL-based elicitation is powerful but may still fail. It also has the merit of being relatively short (which might not have been the case if someone else had introduced the concept of exploration hacking). I refer to this post very often.