(though I would register that trying to form a model with so many knobs to turn is really daunting, so I expect I personally will probably procrastinate a bit before actually putting together one, and I anticipate others to maybe feel similar)
I mean it’s not so daunting if you mostly just defer to Tom & accept the default settings, but then tweak a few settings here and there.
Also it’s very cheap to fiddle with each setting one by one to see how much of an effect it has. Most of them don’t have much of an effect, so you only need to really focus on a few of them (such as the training requirements and the FLOP gap)
+1 for the push for more quantitative models.
(though I would register that trying to form a model with so many knobs to turn is really daunting, so I expect I personally will probably procrastinate a bit before actually putting together one, and I anticipate others to maybe feel similar)
I mean it’s not so daunting if you mostly just defer to Tom & accept the default settings, but then tweak a few settings here and there.
Also it’s very cheap to fiddle with each setting one by one to see how much of an effect it has. Most of them don’t have much of an effect, so you only need to really focus on a few of them (such as the training requirements and the FLOP gap)