ryan_greenblatt comments on What Indicators Should We Watch to Disambiguate AGI Timelines?

ryan_greenblatt 7 Jan 2025 2:49 UTC
10 points
4
Obviously you’d need to be able to automate at least 90% of what capabilities researchers do today.

Actually, I don’t think so. AIs don’t just substitute for human researchers, they can specialize differently. Suppose (for simplicity) there are 2 roughly equally good lines of research that can substitute (e.g. they create some fungible algorithmic progress) and capability researchers currently do 50% of each. Further, suppose that AIs can 30x accelerate the first line of research, but are worthless for the second. This could yield >10x acceleration via researchers just focusing on the first line of research (depending on how diminishing returns go).

This doesn’t make a huge difference to my bottom line view, but it seems likely that this sort of change in specialization makes a 2x difference.

But 90% is a lot, you’ll be pushing out into the long tail of tasks that require taste, subtle tacit knowledge, etc.

I think it could suffice to do a bunch of relatively more banal things extremely fast and cheap. In particular, it could suffice to do: software engineering, experiment babysitting, experiment debugging, optimization, improved experiment interpretation (e.g., trying to identify the important plots and considerations and presenting as concisely and effectively as possible), and generally checking experiment prior to launching them.

As an intution pump, imagine you had nearly free junior hires who run 10x faster and also work all hours. Because they are free, you can run tons of copies. I think this could pretty plausibly speed things up by 10x.

I have a personal suspicion that a surprisingly large fraction of work (possibly but not necessarily limited to “knowledge work”) will turn out to be “AGI complete”, meaning that it will require something approaching full AGI to undertake it at human level.

I’m not sure if I exactly disagree, but I do think there is a ton of variation in the human range such that I dispute the way you seem to use “AGI complete”. I do think that the systems doing this acceleration will be quite general and capable and will be in some sense close to AGI. (Though less so if this occurs earlier like in my 20th percentile world.)

And I don’t see current approaches delivering much progress on things I think will be needed for such capability, such as long-term memory, continuous learning, ability to “break out of the chatbox” and deal with open-ended information sources and extraneous information, or other factors that I mentioned in the original post.

Suppose a company specifically trained an AI system to be very familiar with its code base and infrastructure and relatively good at doing experiments for it. Then, it seems plausible that (with some misc schlep) the only needed context would be project specific context. It seems pretty plausible you can fit the context for tasks humans would do in a week into a 1 million token context window especially with some tweaks and some forking/sub-agents. And automating 1 week seems like it could suffice for big acceleration depending on various details. (Concretely, code is roughly 10 tokens per line, we might expect the AI to write <20k lines including revision, commands etc and to receive not much more than this amount of input. Books are maybe 150k tokens for reference, so the question is whether the AI needs over 6 books of context for 1 week for work. Currently, when AIs automate longer tasks they often do so via fewer steps than humans, spitting out the relevant outputs more directly, so I expect that the context needed for the AI is somewhat less.) Of course, it isn’t clear that models will be able to use their context window as well as humans use longer term memory.

As far as continuous learning, what if the AI company does online training of their AI systems based on all internal usage^[1]? (Online training = just RL train on all internal usage based on human ratings or other sources of feedback.) Is the concern that this will be too sample inefficent (even with proliferation or other hacks)? (I don’t think it is obvious this goes either way but a binary “no continuous learning method is known” doesn’t seem right to me.)
1. ↩︎
  Confidentiality concerns might prevent training on literally all internal usage.
- snewman 7 Jan 2025 3:34 UTC
  11 points
  5
  Parent
  Thanks for engaging so deeply on this!
  AIs don’t just substitute for human researchers, they can specialize differently. Suppose (for simplicity) there are 2 roughly equally good lines of research that can substitute (e.g. they create some fungible algorithmic progress) and capability researchers currently do 50% of each. Further, suppose that AIs can 30x accelerate the first line of research, but are worthless for the second. This could yield >10x acceleration via researchers just focusing on the first line of research (depending on how diminishing returns go).
  Good point, this would have some impact.
  As an intution pump, imagine you had nearly free junior hires who run 10x faster, but also work all hours. Because they are free, you can run tons of copies. I think this could pretty plausibly speed things up by 10x.
  Wouldn’t you drown in the overhead of generating tasks, evaluating the results, etc.? As a senior dev, I’ve had plenty of situations where junior devs were very helpful, but I’ve also had plenty of situations where it was more work for me to manage them than it would have been to do the job myself. These weren’t incompetent people, they just didn’t understand the situation well enough to make good choices and it wasn’t easy to impart that understanding. And I don’t think I’ve ever been sole tech lead for a team that was overall more than, say, 5x more productive than I am on my own – even when many of the people on the team were quite senior themselves. I can’t imagine trying to farm out enough work to achieve 10x of my personal productivity. There’s only so much you can delegate unless the system you’re delegating to has the sort of taste, judgement, and contextual awareness that a junior hire more or less by definition does not. Also you might run into the issue I mentioned where the senior person in the center of all this is no longer getting their hands dirty enough to collect the input needed to drive their high-level intuition and do their high-value senior things.
  Hmm, I suppose it’s possible that AI R&D has a different flavor than what I’m used to. The software projects I’ve spent my career on are usually not very experimental in nature; the goal is generally not to learn whether an idea shows promise, it’s to design and implement code to implement a feature spec, for integration into the production system. If a junior dev does a so-so job, I have to work with them to bring it up to a higher standard, because we don’t want to incur the tech debt of integrating so-so code, we’d be paying for it for years. Maybe that plays out differently in AI R&D?
  Incidentally, in this scenario, do you actually get to 10x the productivity of all your staff? Or do you just get to fire your junior staff? Seems like that depends on the distribution of staff levels today and on whether, in this world, junior staff can step up and productively manage AIs themselves.
  Suppose a company specifically trained an AI system to be very familiar with its code base and infrastructure and relatively good at doing experiments for it. Then, it seems plausible that (with some misc schlep) the only needed context would be project specific context. …
  These are fascinating questions but beyond what I think I can usefully contribute to in the format of a discussion thread. I might reach out at some point to see whether you’re open to discussing further. Ultimately I’m interested in developing a somewhat detailed model, with well-identified variables / assumptions that can be tested against reality.
  - ryan_greenblatt 7 Jan 2025 4:27 UTC
    11 points
    4
    Parent
    Wouldn’t you drown in the overhead of generating tasks, evaluating the results, etc.? As a senior dev, I’ve had plenty of situations where junior devs were very helpful, but I’ve also had plenty of situations where it was more work for me to manage them than it would have been to do the job myself. These weren’t incompetent people, they just didn’t understand the situation well enough to make good choices and it wasn’t easy to impart that understanding. And I don’t think I’ve ever been sole tech lead for a team that was overall more than, say, 5x more productive than I am on my own – even when many of the people on the team were quite senior themselves. I can’t imagine trying to farm out enough work to achieve 10x of my personal productivity. There’s only so much you can delegate unless the system you’re delegating to has the sort of taste, judgement, and contextual awareness that a junior hire more or less by definition does not. Also you might run into the issue I mentioned where the senior person in the center of all this is no longer getting their hands dirty enough to collect the input needed to drive their high-level intuition and do their high-value senior things.
    
    I’ve had a pretty similar experience personally but:
    
    I think serial speed matters a lot and you’d be willing to go through a bunch more hassle if the junior devs worked ²⁴⁄₇ and at 10x speed.
    Quantity can be a quality of its own—if you have truely vast (parallel) quantities of labor, you can be much more demanding and picky. (And make junior devs do much more work to understand what is going on.)
    I do think the experimentation thing is probably somewhat big, but I’m uncertain.
    (This one is breaking with the junior dev analogy, but whatever.) In the AI case, you can train/instruct once and then fork many times. In the analogy, this would be like you spending 1 month training the junior dev (who still works ²⁴⁄₇ and at 10x speed, so 10 months for them) and then forking them into many instances. Of course, perhaps AI sample efficiency is lower. However, my personal guess is that lots of compute spent on learning and aggressive schlep (e.g. proliferation, lots of self-supervised learning, etc) can plausibly substantially reduce or possibly eliminate the gap (at least once AIs are more capable) similar to how it works for EfficientZero.