I think logically the safety research needs to more than incrementally progress toward alignment (your implied claim in that burden of proof). It needs to speed alignment toward the finish line (working alignment for the AGI we actually build) more than it speeds capabilities toward the finish line of building takeover-capable AGI.
I agree with you that in general, research tends to make progress toward its stated goals.
But isn’t it a little odd that nobody I know of has a specific story for how we get from tuning and interpretability of LLMs to functionally safe AGI and ASI? I do have such a story, but the tuning and interpretability play only a minor role despite making up the vast bulk of “safety research”.
Research usually just goes in a general direction, and gets unexpected benefits as well as eventually accomplishing some of its stated goals. But having a more specific roadmap seems wise when some of those “unexpected benefits” might kill everyone.
That’s not to say I think we should shut down safety research; I just think we should have a bit more of a plan for how it accomplishes the stated goals. I’m afraid we’ve gotten a bit distracted from AGI x-risk by making LLMs safe—when nobody ever thought LLMs by themselves are likely to be very dangerous.
I think logically the safety research needs to more than incrementally progress toward alignment (your implied claim in that burden of proof). It needs to speed alignment toward the finish line (working alignment for the AGI we actually build) more than it speeds capabilities toward the finish line of building takeover-capable AGI.
I agree with you that in general, research tends to make progress toward its stated goals.
But isn’t it a little odd that nobody I know of has a specific story for how we get from tuning and interpretability of LLMs to functionally safe AGI and ASI? I do have such a story, but the tuning and interpretability play only a minor role despite making up the vast bulk of “safety research”.
Research usually just goes in a general direction, and gets unexpected benefits as well as eventually accomplishing some of its stated goals. But having a more specific roadmap seems wise when some of those “unexpected benefits” might kill everyone.
That’s not to say I think we should shut down safety research; I just think we should have a bit more of a plan for how it accomplishes the stated goals. I’m afraid we’ve gotten a bit distracted from AGI x-risk by making LLMs safe—when nobody ever thought LLMs by themselves are likely to be very dangerous.