When I brought up sample inefficiency, I was supporting Mr. Helm-Burger‘s statement that “there’s huge algorithmic gains in …training efficiency (less data, less compute) … waiting to be discovered”. You’re right of course that a reduction in training data will not necessarily reduce the amount of computation needed. But once again, that’s the way to bet.
a reduction in training data will not necessarily reduce the amount of computation needed. But once again, that’s the way to bet
I’m ambivalent on this. If the analogy between improvement of sample efficiency and generation of synthetic data holds, synthetic data seems reasonably likely to be less valuable than real data (per token). In that case we’d be using all the real data we have anyway, which with repetition is sufficient for up to about $100 billion training runs (we are at $100 million right now). Without autonomous agency (not necessarily at researcher level) before that point, there won’t be investment to go over that scale until much later, when hardware improves and the cost goes down.
When I brought up sample inefficiency, I was supporting Mr. Helm-Burger‘s statement that “there’s huge algorithmic gains in …training efficiency (less data, less compute) … waiting to be discovered”. You’re right of course that a reduction in training data will not necessarily reduce the amount of computation needed. But once again, that’s the way to bet.
I’m ambivalent on this. If the analogy between improvement of sample efficiency and generation of synthetic data holds, synthetic data seems reasonably likely to be less valuable than real data (per token). In that case we’d be using all the real data we have anyway, which with repetition is sufficient for up to about $100 billion training runs (we are at $100 million right now). Without autonomous agency (not necessarily at researcher level) before that point, there won’t be investment to go over that scale until much later, when hardware improves and the cost goes down.