Thank you for this comment. I’m curious to understand the source of disagreement between us, given that you generally agree with each of the sub-points. Do you really think that the chances of misalignment with LM-based AI systems is above 90%? What exactly do you mean by misalignment in this context and why do you think it’s the most likely result with such AI? Do you think it will happen even if humanity sticks with the paradigm I described (of chaining pure language models while avoiding training models on open-ended tasks)?
I want to also note that my argument is less about “developing language models was counterfactually a good thing” and more “given that language models have been developed (which is now a historic fact), the safest path towards human-level AGI might be to stick with pure language models”.
Thank you for this comment. I’m curious to understand the source of disagreement between us, given that you generally agree with each of the sub-points. Do you really think that the chances of misalignment with LM-based AI systems is above 90%? What exactly do you mean by misalignment in this context and why do you think it’s the most likely result with such AI? Do you think it will happen even if humanity sticks with the paradigm I described (of chaining pure language models while avoiding training models on open-ended tasks)?
I want to also note that my argument is less about “developing language models was counterfactually a good thing” and more “given that language models have been developed (which is now a historic fact), the safest path towards human-level AGI might be to stick with pure language models”.