Fwiw, I do have the reverse view, but my reason is more that “auditing a trained model” does not have a great story for wins. Like, either you find that the model is fine (in which case it would have been fine if you skipped the auditing) or you find that the model will kill you (in which case you don’t deploy your AI system, and someone else destroys the world instead).
The way I’d put something-like-this is that in order for auditing the model to help (directly), you have to actually be pretty confident in your ability to understand and fix your mistakes if you find one. It’s not like getting a coin to land Heads by flipping it again if it lands Tails—different AGI projects are not independent random variables, if you don’t get good results the first time you won’t get good results the next time unless you understand what happened. This means that auditing trained models isn’t really appropriate for the middle of the skill curve.
Instead, it seems like something you could use after already being confident you’re doing good stuff, as quality control. This sharply limits the amount you expect it to save you, but might increase some other benefits of having an audit, like convincing people you know what you’re doing and aren’t trying to play Defect.
The way I’d put something-like-this is that in order for auditing the model to help (directly), you have to actually be pretty confident in your ability to understand and fix your mistakes if you find one. It’s not like getting a coin to land Heads by flipping it again if it lands Tails—different AGI projects are not independent random variables, if you don’t get good results the first time you won’t get good results the next time unless you understand what happened. This means that auditing trained models isn’t really appropriate for the middle of the skill curve.
Instead, it seems like something you could use after already being confident you’re doing good stuff, as quality control. This sharply limits the amount you expect it to save you, but might increase some other benefits of having an audit, like convincing people you know what you’re doing and aren’t trying to play Defect.