I am not sure I fully understand your point, but the problem with detecting sandbagging is that you do not know the actual capability of a model. And I guess that you mean “an anomalous decrease in capability” and not increase?
Regardless, could you spell out more how exactly you’d detect sandbagging?
Nathan’s suggestion is that adding noise to a sandbagging model might increase performance, rather than decrease it as usual for a non-sandbagging model. It’s an interesting idea!
I am not sure I fully understand your point, but the problem with detecting sandbagging is that you do not know the actual capability of a model. And I guess that you mean “an anomalous decrease in capability” and not increase?
Regardless, could you spell out more how exactly you’d detect sandbagging?
Nathan’s suggestion is that adding noise to a sandbagging model might increase performance, rather than decrease it as usual for a non-sandbagging model. It’s an interesting idea!
Oh, I see. This is an interesting idea. I am not sure it will work, but definitely worth trying out!