Tristan Wegner comments on How to Model the Future of Open-Source LLMs?

Tristan Wegner 22 Apr 2024 10:42 UTC
3 points
0
I agree with the premise, but not the conclusion of your last point. Any OpenSource development, that will significantly lower the resource requirements can also be used by closed models to just increased their model/training size for the same cost, thus keeping the gap.
- Aaron_Scher 22 Apr 2024 17:48 UTC
  4 points
  0
  Parent
  Yeah, these developments benefit close-sourced actors too. I think my wording was not precise, and I’ll edit it. This argument about algorithmic improvement is an argument that we will have powerful open source models (and powerful closed-source models), not that the gap between these will necessarily shrink. I think both the gap and the absolute level of capabilities which are open-source are important facts to be modeling. And this argument is mainly about the latter.
- Nathan Helm-Burger 23 Apr 2024 5:26 UTC
  3 points
  0
  Parent
  Unless there is a ‘peak-capabilities wall’ that gets hit by current architectures that doesn’t get overcome by the combined effects of the compute-efficiency-improving algorithmic improvements. In that case, the gap would close because any big companies that tried to get ahead by just naively increasing compute and having just a few hidden algorithmic advantages would be unable to get very far ahead because of the ‘peak-capabilities wall’. It would get cheaper to get to the wall, but once there, extra money/compute/data would be wasted. Thus, a shrinking-gap world.
  I’m not sure if there will be a ‘peak-capabilities wall’ in this way, or if the algorithmic advancements will be creative enough to get around it. The shape of the future in this regard seems highly uncertain to me. I do think it’s theoretically possible to get substantial improvements in peak capabilities and also in training/inference efficiencies. Will such improvements keep arriving relatively gradually as they have been? Will there be a sudden glut at some point when the models hit a threshold where they can be used to seek and find algorithmic improvements? Very unclear.