Sam Marks comments on Thoughts on “AI is easy to control” by Pope & Belrose

Sam Marks 3 Dec 2023 22:36 UTC
LW: 10 AF: 3
5
AF
I agree with everything you wrote here and in the sibling comment: there are reasonable hopes for bootstrapping alignment as agents grow smarter; but without a concrete bootstrapping proposal with an accompanying argument, <1% P(doom) from failing to bootstrap alignment doesn’t seem right to me.
I’m guessing this is my biggest crux with the Quintin/Nora worldview, so I guess I’m bidding for—if Quintin/Nora have an argument for optimism about bootstrapping beyond “it feels like this should work because of iterative design”—for that argument to make it into the forthcoming document.