paulfchristiano comments on Another (outer) alignment failure story

paulfchristiano 8 Apr 2021 15:53 UTC
9 points
I think the upshot of those technologies (and similarly for ML assistants) is:
1. It takes longer before you actually face a catastrophe.
2. In that time, you can make faster progress towards an “out”
By an “out” I mean something like: (i) figuring out how to build competitive aligned optimizers, (ii) coordinating to avoid deploying unaligned AI.
Unfortunately I think [1] is a bit less impactful than it initially seems, at least if we live in a world of accelerating growth towards a singularity. For example, if the singularity is in 2045 and it’s 2035, and you were going to have catastrophic failure in 2040, you can’t really delay it by much calendar time. So [1] helps you by letting you wait until you get fancier technology from the fast outside economy, but doesn’t give you too much more time for the slow humane economy to “catch up” on its own terms.