JustisMills comments on The Data Wall is Important

JustisMills 10 Jun 2024 22:51 UTC
2 points
−2
Mostly just public text, I think. But I’m not sure how much more you get out of e.g. video transcripts. Maybe a lot! But it wouldn’t surprise me if that was notably worse as a source.
- Yair Halberstadt 11 Jun 2024 4:03 UTC
  8 points
  6
  Parent
  Not video transcripts—video. 1 frame of Video contains much more data than 1 text token, and you can train an AI as a next frame predictor much as you can a next token predictor.