I think an important crux here is whether you think that we can build institutions which are reasonably good at checking the quality of AI safety work done by humans
Why is this an important crux? Is it necessarily the case that if we can reliably check AI safety work done by humans that we we reliably check AI safety work done by Ai’s which may be optimising against us?
Why is this an important crux? Is it necessarily the case that if we can reliably check AI safety work done by humans that we we reliably check AI safety work done by Ai’s which may be optimising against us?
It’s not necessarily the case. But in practice this tends to be a key line of disagreement.