gwern comments on Forecasting ML Benchmarks in 2023

gwern 20 Jul 2022 17:03 UTC
6 points
2
- Not Relevant 20 Jul 2022 17:32 UTC
  3 points
  0
  Parent
  180m books now
  That’s still just 20T tokens.
  academic papers/theses are a few mill a year too
  10M papers per year x 10,000 tokens per paper x 30 years = 3T tokens.
  You raise the possibility that data quality might be important and that maybe “papers/theses” are higher quality than Chinchilla scaling laws identified on The Pile; I don’t really have a good intuition here.
  I spent a little while trying to find upload numbers for the other video platforms, to no avail. Per Wikipedia, Twitch is the 3rd largest worldwide video platform (though this doesn’t count apps, esp. TikTok/Instagram). Twitch has an average of 100,000 streams going on at any given times x 3e8 tokens per video-year (x maybe 5 years) = 100T tokens, similar to YouTube. So this does convince me that there are probably a few more entities with this much video data.
  - gwern 20 Jul 2022 18:20 UTC
    4 points
    2
    Parent
    - Not Relevant 20 Jul 2022 18:40 UTC
      2 points
      1
      Parent
      I agree that if you put enough of these together, there are probably ~10 actors that can scrape together >200T tokens. This is important for coordination; it means the number of organizations that can play at this level will be heavily bottlenecked, potentially for years (until a bunch more data can be generated, which won’t be free). It seems to me that these ~10 actors are large highly-legible entities that are already well-known to the US or Chinese governments. This could be a meaningful lever for mitigating the race-dynamic fear that “even if we don’t do it, someone else will”, reducing everything to a 2 party US-China negotiation.
      - Noosphere89 20 Jul 2022 20:43 UTC
        1 point
        0
        Parent
        The big problem is the cold war mentality is back, and both sides will compete a lot more rather than cooperate. Combine this with a bit of an arms race by China and the US, and the chances for cooperation on existential risk are remote.
        Not Relevant 20 Jul 2022 20:51 UTC
        1 point
        0
        Parent
        This is a separate discussion, but it is important to point out that the literal Cold War had the opposing powers cooperate on existential risk reduction. Granted that before that, two cities were burned to ash and we played apocalypse chicken in Cuba.
    - Not Relevant 20 Jul 2022 20:27 UTC
      1 point
      0
      Parent
      Two more points:
      
      The specific upper bound does matter if we’re worried about superintelligence. If easy-to-get data instead capped out at 10 quadrillion tokens, it’d be easy to blow past 10T-param models; if we conveniently threshold around human-level params, we might be more likely to be dealing with “fast parallel Von Neumanns” than a basilisk, at least initially.
      Just to register a prediction: I would be very surprised if photos have anywhere near as much information content as text/video, given their relative lack of long-term causal structure.
      - Noosphere89 20 Jul 2022 20:41 UTC
        1 point
        0
        Parent
        In short, while concerted effort could plausibly give us human intelligence, it is likely not to go superhuman and FOOM.
        Not Relevant 20 Jul 2022 20:53 UTC
        1 point
        0
        Parent
        I wouldn’t go that far; using these systems to do recursive self-improvement via different learning paradigms (e.g. by designing simulators) could still get FOOM; it just seems less likely to me to happen by accident in the ordinary coarse of SSL training.