The companies will have an incentive to make an AI slightly smarter than their competition. And if there is a law against it, they will try to hack it somehow… for example, they will try to make their AI do worse of the official government benchmarks but better at things their users care about. Or perhaps make an AI with IQ 200 and tell it to act like it has IQ 100 when it suspects it is doing a government test.
Or perhaps make an AI with IQ 200 and tell it to act like it has IQ 100 when it suspects it is doing a government test.
Being investigated these days as ‘sandbagging’; there’s a good new paper on that from some of my MATS colleagues.
The companies will have an incentive to make an AI slightly smarter than their competition. And if there is a law against it, they will try to hack it somehow
Agree but that’s true of regulation in general. Do you think it’s unusually true of regulation along these lines, vs eg existing eval approaches like METR’s?
The companies will have an incentive to make an AI slightly smarter than their competition. And if there is a law against it, they will try to hack it somehow… for example, they will try to make their AI do worse of the official government benchmarks but better at things their users care about. Or perhaps make an AI with IQ 200 and tell it to act like it has IQ 100 when it suspects it is doing a government test.
Being investigated these days as ‘sandbagging’; there’s a good new paper on that from some of my MATS colleagues.
Agree but that’s true of regulation in general. Do you think it’s unusually true of regulation along these lines, vs eg existing eval approaches like METR’s?