Mostly just public text, I think. But I’m not sure how much more you get out of e.g. video transcripts. Maybe a lot! But it wouldn’t surprise me if that was notably worse as a source.
Not video transcripts—video. 1 frame of Video contains much more data than 1 text token, and you can train an AI as a next frame predictor much as you can a next token predictor.
Mostly just public text, I think. But I’m not sure how much more you get out of e.g. video transcripts. Maybe a lot! But it wouldn’t surprise me if that was notably worse as a source.
Not video transcripts—video. 1 frame of Video contains much more data than 1 text token, and you can train an AI as a next frame predictor much as you can a next token predictor.