I’m having trouble discerning a difference between our opinions, as I expect a “kind-of AGI” to come out of LLM tech, given enough investment. Re: code assistants, I’m generally disappointed with Github Copilot. It’s not unusual that I’m like “wow, good job”, but bad completions are commonplace, especially when I ask a question in the sidebar (which should use a bigger LLM). Its (very hallucinatory) response typically demonstrates that it doesn’t understand our (relatively small) codebase very well, to the point where I only occasionally bother asking. (I keep wondering “did no one at GitHub think to generate an outline of the app that could fit in the context window?”)
Yes, I agree our views are quite close. My expectations closely match what you say here:
Although LLMs badly suck at reasoning, my AGI timelines are still kinda short―roughly 1 to 15 years for “real” AGI, with quasi-AGI in 2 to 6 years―mainly because so much funding is going into this, and because only one researcher needs to figure out the secret, and because so much research is being shared publicly, and because there should be many ways to do AGI, and because quasi-AGI (if invented first) might help create real AGI.
Basically I just want to point out that the progression of competence in recent models seems pretty impressive, even though the absolute values are low.
For instance, for writing code I think the following pattern of models (including only ones I’ve personally tested enough to have an opinion) shows a clear trend of increasing competence with later release dates:
Github Copilot (pre-GPT-4) < GPT-4 (the first release) < Claude 3 Opus < Claude 3.5 Sonnet
Basically, I’m holding in my mind the possibility that the next versions (GPT-5 and/or Claude Opus 4) will really impress me. I don’t feel confident of that. I am pretty confident that the version after next will impress me (e.g. GPT-6 / Claude Opus 5) and actually be useful for RSI.
From this list, Claude 3.5 Sonnet is the first one to be competent enough I find it even occasionally useful. I made myself use the others just to get familiar with their abilities, but their outputs just weren’t worth the time and effort on average.
I’m having trouble discerning a difference between our opinions, as I expect a “kind-of AGI” to come out of LLM tech, given enough investment. Re: code assistants, I’m generally disappointed with Github Copilot. It’s not unusual that I’m like “wow, good job”, but bad completions are commonplace, especially when I ask a question in the sidebar (which should use a bigger LLM). Its (very hallucinatory) response typically demonstrates that it doesn’t understand our (relatively small) codebase very well, to the point where I only occasionally bother asking. (I keep wondering “did no one at GitHub think to generate an outline of the app that could fit in the context window?”)
Yes, I agree our views are quite close. My expectations closely match what you say here:
Basically I just want to point out that the progression of competence in recent models seems pretty impressive, even though the absolute values are low.
For instance, for writing code I think the following pattern of models (including only ones I’ve personally tested enough to have an opinion) shows a clear trend of increasing competence with later release dates:
Github Copilot (pre-GPT-4) < GPT-4 (the first release) < Claude 3 Opus < Claude 3.5 Sonnet
Basically, I’m holding in my mind the possibility that the next versions (GPT-5 and/or Claude Opus 4) will really impress me. I don’t feel confident of that. I am pretty confident that the version after next will impress me (e.g. GPT-6 / Claude Opus 5) and actually be useful for RSI.
From this list, Claude 3.5 Sonnet is the first one to be competent enough I find it even occasionally useful. I made myself use the others just to get familiar with their abilities, but their outputs just weren’t worth the time and effort on average.