Or perhaps make an AI with IQ 200 and tell it to act like it has IQ 100 when it suspects it is doing a government test.
Being investigated these days as ‘sandbagging’; there’s a good new paper on that from some of my MATS colleagues.
The companies will have an incentive to make an AI slightly smarter than their competition. And if there is a law against it, they will try to hack it somehow
Agree but that’s true of regulation in general. Do you think it’s unusually true of regulation along these lines, vs eg existing eval approaches like METR’s?
Being investigated these days as ‘sandbagging’; there’s a good new paper on that from some of my MATS colleagues.
Agree but that’s true of regulation in general. Do you think it’s unusually true of regulation along these lines, vs eg existing eval approaches like METR’s?