I’m not disputing that they were trained with next token prediction log loss (if you read the tech reports they claim to do exactly this) — I’m just disputing the “on the internet” part, due to the use of synthetic data and private instruction following examples.
I’m not disputing that they were trained with next token prediction log loss (if you read the tech reports they claim to do exactly this) — I’m just disputing the “on the internet” part, due to the use of synthetic data and private instruction following examples.