I agree with everything you wrote here and in the sibling comment: there are reasonable hopes for bootstrapping alignment as agents grow smarter; but without a concrete bootstrapping proposal with an accompanying argument, <1% P(doom) from failing to bootstrap alignment doesn’t seem right to me.
I’m guessing this is my biggest crux with the Quintin/Nora worldview, so I guess I’m bidding for—if Quintin/Nora have an argument for optimism about bootstrapping beyond “it feels like this should work because of iterative design”—for that argument to make it into the forthcoming document.
I agree with everything you wrote here and in the sibling comment: there are reasonable hopes for bootstrapping alignment as agents grow smarter; but without a concrete bootstrapping proposal with an accompanying argument, <1% P(doom) from failing to bootstrap alignment doesn’t seem right to me.
I’m guessing this is my biggest crux with the Quintin/Nora worldview, so I guess I’m bidding for—if Quintin/Nora have an argument for optimism about bootstrapping beyond “it feels like this should work because of iterative design”—for that argument to make it into the forthcoming document.