I think there’s one important factor missing: if you really used evals for regulation, then they would be gamed. I trust more the eval when the company is not actually at stake on it. If it was, there would be a natural tendence for evals to slide towards empty box-checking.
I think there’s one important factor missing: if you really used evals for regulation, then they would be gamed. I trust more the eval when the company is not actually at stake on it. If it was, there would be a natural tendence for evals to slide towards empty box-checking.