GPT-2 1.5B is small by today’s standards. I hypothesize people are not sure if findings made for models of this scale will generalize to frontier models (or at least to the level of LLaMa-3.1-70B), and that’s why nobody is working on it.
However, I was impressed by “Pre-Training from Human Preferences”. I suppose that pretraining could be improved, and it would be a massive deal for alignment.
GPT-2 1.5B is small by today’s standards. I hypothesize people are not sure if findings made for models of this scale will generalize to frontier models (or at least to the level of LLaMa-3.1-70B), and that’s why nobody is working on it.
However, I was impressed by “Pre-Training from Human Preferences”. I suppose that pretraining could be improved, and it would be a massive deal for alignment.