johnswentworth comments on [Link] A minimal viable product for alignment

johnswentworth 9 Apr 2022 17:31 UTC
LW: -2 AF: -2
AF
That falls squarely under the “other reasons to think our models are not yet deceptive”—i.e. we have priors that we’ll see models which are bad at deception before models become good at deception. The important evidential work there is being done by the prior.