Sure, even seems a bit tautological: any such metric, to be robust, would need to contain in itself a definition of a dangerously-capable AI, so you probably wouldn’t even need to train a model to maximize it. You’d be able to just lift the design from the metric directly.
Sure, even seems a bit tautological: any such metric, to be robust, would need to contain in itself a definition of a dangerously-capable AI, so you probably wouldn’t even need to train a model to maximize it. You’d be able to just lift the design from the metric directly.