Teun van der Weij comments on An Introduction to AI Sandbagging

Teun van der Weij 10 May 2024 5:23 UTC
1 point
0
I am not sure I fully understand your point, but the problem with detecting sandbagging is that you do not know the actual capability of a model. And I guess that you mean “an anomalous decrease in capability” and not increase?

Regardless, could you spell out more how exactly you’d detect sandbagging?
- Francis Rhys Ward 14 May 2024 2:49 UTC
  2 points
  0
  Parent
  Nathan’s suggestion is that adding noise to a sandbagging model might increase performance, rather than decrease it as usual for a non-sandbagging model. It’s an interesting idea!
  - Teun van der Weij 14 May 2024 11:22 UTC
    1 point
    0
    Parent
    Oh, I see. This is an interesting idea. I am not sure it will work, but definitely worth trying out!