Not Richard, but I basically endorse that description as a description of my own view. (Note however that we don’t yet know that Transformers-trained-by-SGD-on-text-prediction can’t reason; I for one am not willing to claim that scaling even further will not result in reasoning.)
It’s not a certainty—it’s plausible that text prediction is enough, if you just improved the architecture and learning algorithm a little bit—but I doubt it, except in some degenerate sense that you could put a ton of information / inductive bias into the architecture and make it an AGI that way.
I endorse Steve’s description as a caricature of my view, and also Rohin’s comment. To flesh out my view a little more: I think that GPT-3 doing so well on language without (arguably) being able to reason, is the same type of evidence as Deep Blue or AlphaGo doing well at board games without being able to reason (although significantly weaker). In both cases it suggests that just optimising for this task is not sufficient to create general intelligence. While it now seems pretty unreasonable to think that a superhuman chess AI would by default be generally intelligent, that seems not too far off what people used to think.
Now, it might be the case that the task doesn’t matter very much for AGI if you “put a ton of information / inductive bias into the architecture”, as Rohin puts it. But I interpret Sutton to be arguing against our ability to do so.
We’ll eventually invent a different architecture-and-learning-algorithm that is suited to reasoning
There are two possible interpretations of which, one of which I agree with, one of which I don’t. I could either interpret you as saying that we’ll eventually develop an architecture/learning algorithm biased towards reasoning ability—I disagree with this.
Or you could be saying that future architectures will be capableof reasoning in ways that transformers aren’t, by virtue of just being generally more powerful. Which seems totally plausible to me.
Yeah, I think that reasoning, along with various other AGI prerequisites, requires an algorithm that does probabilistic programming / analysis-by-synthesis during deployment. And I think that trained Transformer models don’t do that, no matter what their size and parameters are. I guess I should write a post about why I think that—it’s a bit of a hazy tangle of ideas in my mind right now. :-)
(I’m more-or-less saying the interpretation you disagree with in your second-to-last paragraph.)
Not Richard, but I basically endorse that description as a description of my own view. (Note however that we don’t yet know that Transformers-trained-by-SGD-on-text-prediction can’t reason; I for one am not willing to claim that scaling even further will not result in reasoning.)
It’s not a certainty—it’s plausible that text prediction is enough, if you just improved the architecture and learning algorithm a little bit—but I doubt it, except in some degenerate sense that you could put a ton of information / inductive bias into the architecture and make it an AGI that way.
I endorse Steve’s description as a caricature of my view, and also Rohin’s comment. To flesh out my view a little more: I think that GPT-3 doing so well on language without (arguably) being able to reason, is the same type of evidence as Deep Blue or AlphaGo doing well at board games without being able to reason (although significantly weaker). In both cases it suggests that just optimising for this task is not sufficient to create general intelligence. While it now seems pretty unreasonable to think that a superhuman chess AI would by default be generally intelligent, that seems not too far off what people used to think.
Now, it might be the case that the task doesn’t matter very much for AGI if you “put a ton of information / inductive bias into the architecture”, as Rohin puts it. But I interpret Sutton to be arguing against our ability to do so.
There are two possible interpretations of which, one of which I agree with, one of which I don’t. I could either interpret you as saying that we’ll eventually develop an architecture/learning algorithm biased towards reasoning ability—I disagree with this.
Or you could be saying that future architectures will be capable of reasoning in ways that transformers aren’t, by virtue of just being generally more powerful. Which seems totally plausible to me.
Got it!
Yeah, I think that reasoning, along with various other AGI prerequisites, requires an algorithm that does probabilistic programming / analysis-by-synthesis during deployment. And I think that trained Transformer models don’t do that, no matter what their size and parameters are. I guess I should write a post about why I think that—it’s a bit of a hazy tangle of ideas in my mind right now. :-)
(I’m more-or-less saying the interpretation you disagree with in your second-to-last paragraph.)
Thanks again for explaining!