Gordon Seidoh Worley comments on Bootstrapped Alignment

Gordon Seidoh Worley 8 Mar 2021 18:24 UTC
2 points
I’m confused again here. Is this implying that a Friendly AI, per the definition above, is not an optimizer?
No. It’s saying the process by which Friendly AI is designed is not an optimizer (although see my caveats in the previous apply about choosing alignment criteria; it’s still technically optimization but constrained as much as possible to eliminate the normal Goodharting mechanism). The AI itself pretty much has to be an optimizer to do anything useful.
I am very pessimitic about being able to align an AI without any sort of feedback loop on the reward (thus, without optimization). The world’s overall transition dynamics are likely to be chaotic, so the “initial state” of an AI that is provably aligned without feedback needs to be exactly the right one to obtain the outcome we want. It could be that the chaos does not affect what we care about, but I’m unsure about that, even linear systems can be chaotic.
I’m similarly pessimistic as it seems quite a hard problem and after 20 years we still don’t really know how to start (or so I think; maybe MIRI folks feel differently and that we have made some real progress here). Hence why maybe bootstrapping to alignment is the best alternative given I think totally abandoning the Friendly AI strategy is also a bad choice.