(This comment points out less important technical errata.)
ChatGPT [...] This was back in the GPT2 / GPT2.5 era
ChatGPT never ran on GPT-2, and GPT-2.5 wasn’t a thing.
with negative RL signals associated with it?
That wouldn’t have happened. Pretraining doesn’t do RL, and I don’t think anyone would have thrown a novel chapter into the supervised fine-tuning and RLHF phases of training.
(This comment points out less important technical errata.)
ChatGPT never ran on GPT-2, and GPT-2.5 wasn’t a thing.
That wouldn’t have happened. Pretraining doesn’t do RL, and I don’t think anyone would have thrown a novel chapter into the supervised fine-tuning and RLHF phases of training.