Assume PaLM magically improved to perform 2 standard deviations above the human average. In my model, this would have a very slow effect on GDP. How long do you think it would take before language models did >50% of all coordination tasks?
2 standard deviations above the human average with respect to what metric? My whole point is that the metrics people look at in ML papers are not necessarily relevant in the real world and/or the real world impact (say, in revenue generated by the models) is a discontinuous function of these metrics.
I would guess that 2 standard deviations above human average on commonly used language modeling benchmarks is still far from enough for even 10% of coordination tasks, though by this point models could well be generating plenty of revenue.
I think we are close to agreeing with each other on how we expect the future to look. I certainly agree that real world impact is discontinuous in metrics, though I would blame practical matters rather than poor metrics.
2 standard deviations above the human average with respect to what metric? My whole point is that the metrics people look at in ML papers are not necessarily relevant in the real world and/or the real world impact (say, in revenue generated by the models) is a discontinuous function of these metrics.
I would guess that 2 standard deviations above human average on commonly used language modeling benchmarks is still far from enough for even 10% of coordination tasks, though by this point models could well be generating plenty of revenue.
I think we are close to agreeing with each other on how we expect the future to look. I certainly agree that real world impact is discontinuous in metrics, though I would blame practical matters rather than poor metrics.