Kenny comments on AGI Ruin: A List of Lethalities

Kenny 9 Jun 2022 16:56 UTC
1 point
0
I think this mostly covers the relevant intuitions:

I guess the assumption is that superintelligent ML models/systems may not remain uninterpretable to each other, especially not with the strong incentivize to advance interpretability in specific domains/contexts (benefits from cooperation or from making early commitments in commitment races).

It’s the kind of ‘obvious’ strategy that I think sufficiently ‘smart’ people would use already.