I agree with this, but I think that for LLMs/AI to be as impactful as LWers believe, I think it needs to in practice be essentially close to 100% correct/reliable, and I think reliability is underrated as a reason for why LLMs aren’t nearly as useful as the tech people want it to be:
I do think reliability is quite important. As one potential counterargument, though, you can get by with lower reliability if you can add additional error checking and error correcting steps. The research I’ve seen is somewhat mixed on how good LLMs are at catching their own errors (but I haven’t dived into it deeply or tried to form a strong opinion from that research).
One point I make in ‘LLM Generality is a Timeline Crux’: if reliability is the bottleneck, that seems like a substantial point in favor of further scaling solving the problem. If it’s a matter of getting from, say, 78% reliability on some problem to 94%, that seems like exactly the sort of thing scaling will fix (since in fact we’ve seen Number Go Up with scale on nearly all capabilities benchmarks). Whereas that seems less likely if there are some kinds of problems that LLMs are fundamentally incapable of, at least on the current architectural & training approach.
This is why I buy the scaling thesis mostly, and the only real crux is whether @Bogdan Ionut Cirstea or @jacob_cannell is right around timelines.
I do believe some algorithmic improvements matter, but I don’t think they will be nearly as much of a blocker as raw compute, and my pessimistic estimate is that the critical algorithms could be discovered in 24-36 months, assuming we don’t have them.
(I’ll note that my timeline is both quite uncertain and potentially unstable—so I’m not sure how different it is from Jacob’s, everything considered; but yup, that’s roughly my model.)
I agree with this, but I think that for LLMs/AI to be as impactful as LWers believe, I think it needs to in practice be essentially close to 100% correct/reliable, and I think reliability is underrated as a reason for why LLMs aren’t nearly as useful as the tech people want it to be:
https://www.lesswrong.com/posts/YiRsCfkJ2ERGpRpen/?commentId=YxLCWZ9ZfhPdjojnv
I do think reliability is quite important. As one potential counterargument, though, you can get by with lower reliability if you can add additional error checking and error correcting steps. The research I’ve seen is somewhat mixed on how good LLMs are at catching their own errors (but I haven’t dived into it deeply or tried to form a strong opinion from that research).
One point I make in ‘LLM Generality is a Timeline Crux’: if reliability is the bottleneck, that seems like a substantial point in favor of further scaling solving the problem. If it’s a matter of getting from, say, 78% reliability on some problem to 94%, that seems like exactly the sort of thing scaling will fix (since in fact we’ve seen Number Go Up with scale on nearly all capabilities benchmarks). Whereas that seems less likely if there are some kinds of problems that LLMs are fundamentally incapable of, at least on the current architectural & training approach.
This is why I buy the scaling thesis mostly, and the only real crux is whether @Bogdan Ionut Cirstea or @jacob_cannell is right around timelines.
I do believe some algorithmic improvements matter, but I don’t think they will be nearly as much of a blocker as raw compute, and my pessimistic estimate is that the critical algorithms could be discovered in 24-36 months, assuming we don’t have them.
@jacob_cannell’s timeline and model is here:
https://www.lesswrong.com/posts/3nMpdmt8LrzxQnkGp/ai-timelines-via-cumulative-optimization-power-less-long
@Bogdan Ionut Cirstea’s timeline and models are here:
https://x.com/BogdanIonutCir2/status/1827707367154209044
https://x.com/BogdanIonutCir2/status/1826214776424251462
https://x.com/BogdanIonutCir2/status/1826032534863622315
(I’ll note that my timeline is both quite uncertain and potentially unstable—so I’m not sure how different it is from Jacob’s, everything considered; but yup, that’s roughly my model.)