Here’s a chart of one of the benchmarks the GPT-NAS paper tests on. They GPT-NAS paper is like.… not off trend? Not even SOTA? Honestly looking at all these results my tenative guess is that the differences are basically noise for most techniques; the state space is tiny such that I doubt any of these really leverage actual regularities in it.
Probably nothing, honestly.
Here’s a chart of one of the benchmarks the GPT-NAS paper tests on. They GPT-NAS paper is like.… not off trend? Not even SOTA? Honestly looking at all these results my tenative guess is that the differences are basically noise for most techniques; the state space is tiny such that I doubt any of these really leverage actual regularities in it.
From the Abstract:
They weren’t aiming for SOTA! What happens when they do?