In particular, the goal is to create a notion of transparency strong enough that an attempted deception would be completely transparent.
Is the idea here that
Creating a version of transparency this strong would enable us to mitigate less extreme forms of mesa misalignment ie. This is a strong enough form of transparency that it ‘covers’ the other cases automatically.
Deception should be treated separately from other forms of mesa misalignment.
Is the idea here that
Creating a version of transparency this strong would enable us to mitigate less extreme forms of mesa misalignment ie. This is a strong enough form of transparency that it ‘covers’ the other cases automatically.
Deception should be treated separately from other forms of mesa misalignment.