I don’t think this makes any sense. How are you hoping to get the model to attack except by telling it that it’s in a control evaluation and you want it to attack? It seems that you are definitely going to have to handle the sandbagging.
FWIW I agree with you and wouldn’t put it the way it is in Roger’s post. Not sure what Roger would say in response.
I don’t think this makes any sense. How are you hoping to get the model to attack except by telling it that it’s in a control evaluation and you want it to attack? It seems that you are definitely going to have to handle the sandbagging.
FWIW I agree with you and wouldn’t put it the way it is in Roger’s post. Not sure what Roger would say in response.