I think time lines are fundamentally driven by scale and compute...I am skeptical of the idea that such a counter-intuitive paradigm shift is needed that nobody has even conceived of it yet.
Two possible counterarguments:
I’ve heard multiple ML researchers argue that the last real breakthrough in ML architecture was transformers, in 2017. If that’s the case, and if another breakthrough of that size is needed, then the base rate maybe isn’t that high.
If LLMs hit significant limitations, because of the reasoning issue or because of a data wall, then companies & VCs won’t necessarily keep pouring money into ever-bigger clusters, and we won’t get the continued scaling you suggest.
1 - I think 2017 was not that long ago. My hunch is that the low level architecture of the network itself is not a bottleneck yet. I’d lean on more training procedures and algorithms. I’d throw RLHF and MoE as significant developments, and those are even more recent.
2 - I give maybe 30% chance of a stall, in the case little commercial disruption comes of LLMs. I think there will still be enough research going on at the major labs, and even universities at a smaller scale gives a decent chance at efficiency gains and stuff the big labs can incorporate. Then again, if we agree that they won’t build the power plant, that is also my main way of stalling the timeline 10 years. The reason I only put 30% is I’m expecting multi modalities and Aschenbrenner’s “unhobblings” to get the industry a couple more years of chances to find profit.
Both of those seem plausible, though the second point seems fairly different from your original ‘time lines are fundamentally driven by scale and compute’.
Two possible counterarguments:
I’ve heard multiple ML researchers argue that the last real breakthrough in ML architecture was transformers, in 2017. If that’s the case, and if another breakthrough of that size is needed, then the base rate maybe isn’t that high.
If LLMs hit significant limitations, because of the reasoning issue or because of a data wall, then companies & VCs won’t necessarily keep pouring money into ever-bigger clusters, and we won’t get the continued scaling you suggest.
That’s fair. Here are some things to consider:
1 - I think 2017 was not that long ago. My hunch is that the low level architecture of the network itself is not a bottleneck yet. I’d lean on more training procedures and algorithms. I’d throw RLHF and MoE as significant developments, and those are even more recent.
2 - I give maybe 30% chance of a stall, in the case little commercial disruption comes of LLMs. I think there will still be enough research going on at the major labs, and even universities at a smaller scale gives a decent chance at efficiency gains and stuff the big labs can incorporate. Then again, if we agree that they won’t build the power plant, that is also my main way of stalling the timeline 10 years. The reason I only put 30% is I’m expecting multi modalities and Aschenbrenner’s “unhobblings” to get the industry a couple more years of chances to find profit.
Both of those seem plausible, though the second point seems fairly different from your original ‘time lines are fundamentally driven by scale and compute’.