faul_sname comments on What are the strongest arguments for very short timelines?

faul_sname 23 Dec 2024 21:10 UTC
2 points
0

Simply testing interpolations and extrapolations (e.g. scaling up old forgotten ideas on modern hardware) seems highly likely to reveal plenty of successful new concepts, even if the hit rate per attempt is low

Is this bottlenecked by programmer time or by compute cost?
- Nathan Helm-Burger 23 Dec 2024 21:48 UTC
  7 points
  2
  Parent
  Both? If you increase only one of the two the other becomes the bottleneck?
  
  I agree this means that the decision to devote substantial compute to both inference and to assigning compute resources for running experiments designed by AI reseachers is a large cost. Presumably, as the competence of the AI reseachers gets higher, it feels easier to trust them not to waste their assigned experiment compute.
  
  There was discussion on Dwarkesh Patel’s interview with researcher friends where there was mention that AI reseachers are already restricted by compute granted to them for experiments. Probably also on work hours per week they are allowed to spend on novel “off the main path” research.
  
  So in order for there to be a big surge in AI R&D there’d need to be prioritization of that at a high level. This would be a change of direction from focusing primarily on scaling current techniques rapidly, and putting out slightly better products ASAP.
  
  So yes, if you think that this priority shift won’t happen, then you should doubt that the increase in R&D speed my model predicts will occur.
  
  But what would that world look like? Probably a world where scaling continues to pay dividends, and getting to AGI is more straightforward yhan Steve Byrnes or I expect.
  
  I agree that that’s a substantial probability, but it’s also an AGI-soon sort of world.
  
  I argue that for AGI to be not-soon, you need both scaling to fail and for algorithm research to fail.
  - faul_sname 24 Dec 2024 7:12 UTC
    10 points
    0
    Parent
    Both? If you increase only one of the two the other becomes the bottleneck?
    
    My impression based on talking to people at labs plus stuff I’ve read is that
    
    Most AI researchers have no trouble coming up with useful ways of spending all of the compute available to them
    Most of the expense of hiring AI reseachers is compute costs for their experiments rather than salary
    The big scaling labs try their best to hire the very best people they can get their hands on and concentrate their resources heavily into just a few teams, rather than trying to hire everyone with a pulse who can rub two tensors together.
    
    (Very open to correction by people closer to the big scaling labs).
    
    My model, then, says that compute availability is a constraint that binds much harder than programming or research ability, at least as things stand right now.
    
    There was discussion on Dwarkesh Patel’s interview with researcher friends where there was mention that AI reseachers are already restricted by compute granted to them for experiments. Probably also on work hours per week they are allowed to spend on novel “off the main path” research.
    
    Sounds plausible to me. Especially since benchmarks encourage a focus on ability to hit the target at all rather than ability to either succeed or fail cheaply, which is what’s important in domains where the salary / electric bill of the experiment designer is an insignificant fraction of the total cost of the experiment.
    
    But what would that world look like? [...] I agree that that’s a substantial probability, but it’s also an AGI-soon sort of world.
    
    Yeah, I expect it’s a matter of “dumb” scaling plus experimentation rather than any major new insights being needed. If scaling hits a wall that training on generated data + fine tuning + routing + specialization can’t overcome, I do agree that innovation becomes more important than iteration.
    
    My model is not just “AGI-soon” but “the more permissive thresholds for when something should be considered AGI have already been met, and more such thresholds will fall in short order, and so we should stop asking when we will get AGI and start asking about when we will see each of the phenomena that we are using AGI as a proxy for”.