Vladimir_Nesov comments on Run evals on base models too!

Vladimir_Nesov 4 Apr 2024 19:29 UTC
LW: 5 AF: 2
0
AF
I expect you’d instead need to tune the base model to elicit relevant capabilities first. So instead of evaluating a tuned model intended for deployment (which can refuse to display some capabilities), or a base model (which can have difficulties with displaying some capabilities), you need to tune the model to be more purely helpful, possibly in a way specific to the tasks it’s to be evaluated on.