mesaoptimizer comments on Closed-Source Evaluations

mesaoptimizer 15 Jun 2024 18:30 UTC
2 points
2
Note that the current power differential between evals labs and frontier labs is such that I don’t expect evals labs have the slack to simply state that a frontier model failed their evals.

You’d need regulation with serious teeth and competent ‘bloodhound’ regulators watching the space like a hawk, for such a possibility to occur.