I think this mostly covers the relevant intuitions:
I guess the assumption is that superintelligent ML models/systems may not remain uninterpretable to each other, especially not with the strong incentivize to advance interpretability in specific domains/contexts (benefits from cooperation or from making early commitments in commitment races).
It’s the kind of ‘obvious’ strategy that I think sufficiently ‘smart’ people would use already.
I think this mostly covers the relevant intuitions:
It’s the kind of ‘obvious’ strategy that I think sufficiently ‘smart’ people would use already.