I think you need to zoom out a bit and look at the implications of these papers. The danger isn’t in what people are doing now, it’s in what they might be doing in a few months following on from this work. The NAS paper was a proof of concept. What happens when it’s massively scaled up? What happens when efficiency gains translate into further efficiency gains?
Here’s a chart of one of the benchmarks the GPT-NAS paper tests on. They GPT-NAS paper is like.… not off trend? Not even SOTA? Honestly looking at all these results my tenative guess is that the differences are basically noise for most techniques; the state space is tiny such that I doubt any of these really leverage actual regularities in it.
I think you need to zoom out a bit and look at the implications of these papers. The danger isn’t in what people are doing now, it’s in what they might be doing in a few months following on from this work. The NAS paper was a proof of concept. What happens when it’s massively scaled up? What happens when efficiency gains translate into further efficiency gains?
Probably nothing, honestly.
Here’s a chart of one of the benchmarks the GPT-NAS paper tests on. They GPT-NAS paper is like.… not off trend? Not even SOTA? Honestly looking at all these results my tenative guess is that the differences are basically noise for most techniques; the state space is tiny such that I doubt any of these really leverage actual regularities in it.
From the Abstract:
They weren’t aiming for SOTA! What happens when they do?