Another source of Adversarial robustness issues relates to the model itself becoming deceptive.
As for this:
My intuitions are that imagining a system is actively trying to break safety properties is a wrong framing; it conditions on having designed a system that is not safe.
I unfortunately think this is exactly what real world AI companies are building.
Another source of Adversarial robustness issues relates to the model itself becoming deceptive.
As for this:
I unfortunately think this is exactly what real world AI companies are building.