AnnaSalamon comments on Are there specific books that it might slightly help alignment to have on the internet?

AnnaSalamon 29 Mar 2023 6:03 UTC
4 points
0
I was figuring GPT4 was already trained on a sizable fraction of the internet, and GPT5 would be trained on basically all the text (plus maybe some not-text, not sure). Is this wrong?
- the gears to ascension 29 Mar 2023 6:05 UTC
  4 points
  0
  Parent
  Oh hmm—that could be true. I suspect that data curation is too important though, there are significant gains to be had by not including confusing data as positive examples. [Loading paper links...]
  - Vladimir_Nesov 29 Mar 2023 6:21 UTC
    4 points
    0
    Parent
    
    significant gains to be had by not including confusing data
    
    But things like pre-training with preferences should take care of that concern, no? Just mark good stuff with a magic good-stuff token, but allow the transformer to refine features for everything.
    - the gears to ascension 29 Mar 2023 6:44 UTC
      2 points
      0
      Parent
      Yeah could be. I’m going to abstain from any further claims, I only have so much hunch fluid here