Thanks for sketching that out, I appreciate it. Unlearning significantly improving the safety outlook is something I may not have fully priced in.
My guess is that the central place we differ is that I expect dropping in, say, 100k extra capabilities researchers gets us into greater-than-human intelligence fairly quickly—we’re already seeing LLMs scoring better than human in various areas, so clearly there’s no hard barrier at human level—and at that point control gets extremely difficult.
I do certainly agree that there’s a lot of low-hanging fruit in control that’s well worth grabbing.
Thanks for sketching that out, I appreciate it. Unlearning significantly improving the safety outlook is something I may not have fully priced in.
My guess is that the central place we differ is that I expect dropping in, say, 100k extra capabilities researchers gets us into greater-than-human intelligence fairly quickly—we’re already seeing LLMs scoring better than human in various areas, so clearly there’s no hard barrier at human level—and at that point control gets extremely difficult.
I do certainly agree that there’s a lot of low-hanging fruit in control that’s well worth grabbing.