That is fascinating. I hadn’t seen his “task AGI” plan, and I agree it’s highly overlapping with this proposal—more so than any other work I was aware of. What’s most fascinating is that YK doesn’t currently endorse that plan, even though it looks to me as though on main reason he calls it “insanely difficult” has been mitigated greatly by the success of LLMs in understanding human semantics and therefore preferences. We are already well up his Do-What-I-Mean hierarchy, arguably at an adequate level for safety/success even before inevitable improvements on the way to AGI. In addition, the slow takeoff path we’re on seems to also make the project easier (although less likely to allow a pivotal act before we have many AGIs causing coordination problems).
So, why does YK think we should Shut It Down instead of build DWIM AGI? Ii’ve been trying to figure this out. I think his principal reasons are two: reinforcement learning sounds like a good way to get any central goal somewhat wrong, and being somewhat wrong could well be too much for survival. As I mentioned in the article, I think we have good alternatives to RL alignment, particularly for the AGI we’re most likely to build first, and I don’t think YK has ever considered proposals of that type. Second, he thinks that humans are stunningly foolish, and that competitive race dynamiccs will make them even more prone to critical errors, even for a project that’s in-principal quite accomplishable. On this, I’m afraid I agree. So if I were in charge, I would indeed Shut It Down instead of shooting for DWIM alignment. But I’m not, and neither is YK. He thinks it’s worth trying, to at least slow down AGI progress; I think it’s more critical to use the time we’ve got to refine the alignment approaches that are most likely to actually be deployed.
That is fascinating. I hadn’t seen his “task AGI” plan, and I agree it’s highly overlapping with this proposal—more so than any other work I was aware of. What’s most fascinating is that YK doesn’t currently endorse that plan, even though it looks to me as though on main reason he calls it “insanely difficult” has been mitigated greatly by the success of LLMs in understanding human semantics and therefore preferences. We are already well up his Do-What-I-Mean hierarchy, arguably at an adequate level for safety/success even before inevitable improvements on the way to AGI. In addition, the slow takeoff path we’re on seems to also make the project easier (although less likely to allow a pivotal act before we have many AGIs causing coordination problems).
So, why does YK think we should Shut It Down instead of build DWIM AGI? Ii’ve been trying to figure this out. I think his principal reasons are two: reinforcement learning sounds like a good way to get any central goal somewhat wrong, and being somewhat wrong could well be too much for survival. As I mentioned in the article, I think we have good alternatives to RL alignment, particularly for the AGI we’re most likely to build first, and I don’t think YK has ever considered proposals of that type. Second, he thinks that humans are stunningly foolish, and that competitive race dynamiccs will make them even more prone to critical errors, even for a project that’s in-principal quite accomplishable. On this, I’m afraid I agree. So if I were in charge, I would indeed Shut It Down instead of shooting for DWIM alignment. But I’m not, and neither is YK. He thinks it’s worth trying, to at least slow down AGI progress; I think it’s more critical to use the time we’ve got to refine the alignment approaches that are most likely to actually be deployed.