Steven Byrnes comments on What are the strongest arguments for very short timelines?

Steven Byrnes Dec 23, 2024, 7:32 PM
30 points
8
RE-bench tasks (see page 7 here) are not the kind of AI research where you’re developing new AI paradigms and concepts. The tasks are much more straightforward than that. So your argument is basically assuming without argument that we can get to AGI with just the more straightforward stuff, as opposed to new AI paradigms and concepts.
If we do need new AI paradigms and concepts to get to AGI, then there would be a chicken-and-egg problem in automating AI research. Or more specifically, there would be two categories of AI R&D, with the less important R&D category (e.g. performance optimization and other REbench-type tasks) being automatable by near-future AIs, and the more important R&D category (developing new AI paradigms and concepts) not being automatable.
(Obviously you’re entitled to argue / believe that we don’t need need new AI paradigms and concepts to get to AGI! It’s a topic where I think reasonable people disagree. I’m just suggesting that it’s a necessary assumption for your argument to hang together, right?)
What links here?
- What are the strongest arguments for very short timelines? by Kaj_Sotala (Dec 23, 2024, 9:38 AM; 101 points)
- Nathan Helm-Burger Dec 23, 2024, 8:49 PM
  19 points
  12
  Parent
  I disagree. I think the existing body of published computer science and neuroscience research are chock full of loose threads. Tons of potential innovations just waiting to be harvested by automated researchers. I’ve mentioned this idea elsewhere. I call it an ‘innovation overhang’. Simply testing interpolations and extrapolations (e.g. scaling up old forgotten ideas on modern hardware) seems highly likely to reveal plenty of successful new concepts, even if the hit rate per attempt is low. I think this means a better benchmark would consist of: taking two existing papers, finding a plausible hypothesis which combines the assumptions from the papers, designs and codes and runs tests, then reports on results.
  
  So I don’t think “no new concepts” is a necessary assumption for getting to AGI quickly with the help of automated researchers.
  - faul_sname Dec 23, 2024, 9:10 PM
    2 points
    0
    Parent
    
    Simply testing interpolations and extrapolations (e.g. scaling up old forgotten ideas on modern hardware) seems highly likely to reveal plenty of successful new concepts, even if the hit rate per attempt is low
    
    Is this bottlenecked by programmer time or by compute cost?
    - Nathan Helm-Burger Dec 23, 2024, 9:48 PM
      8 points
      2
      Parent
      Both? If you increase only one of the two the other becomes the bottleneck?
      
      I agree this means that the decision to devote substantial compute to both inference and to assigning compute resources for running experiments designed by AI reseachers is a large cost. Presumably, as the competence of the AI reseachers gets higher, it feels easier to trust them not to waste their assigned experiment compute.
      
      There was discussion on Dwarkesh Patel’s interview with researcher friends where there was mention that AI reseachers are already restricted by compute granted to them for experiments. Probably also on work hours per week they are allowed to spend on novel “off the main path” research.
      
      So in order for there to be a big surge in AI R&D there’d need to be prioritization of that at a high level. This would be a change of direction from focusing primarily on scaling current techniques rapidly, and putting out slightly better products ASAP.
      
      So yes, if you think that this priority shift won’t happen, then you should doubt that the increase in R&D speed my model predicts will occur.
      
      But what would that world look like? Probably a world where scaling continues to pay dividends, and getting to AGI is more straightforward yhan Steve Byrnes or I expect.
      
      I agree that that’s a substantial probability, but it’s also an AGI-soon sort of world.
      
      I argue that for AGI to be not-soon, you need both scaling to fail and for algorithm research to fail.
      - faul_sname Dec 24, 2024, 7:12 AM
        21 points
        2
        Parent
        
        Both? If you increase only one of the two the other becomes the bottleneck?
        
        My impression based on talking to people at labs plus stuff I’ve read is that
        
        Most AI researchers have no trouble coming up with useful ways of spending all of the compute available to them
        Most of the expense of hiring AI reseachers is compute costs for their experiments rather than salary
        The big scaling labs try their best to hire the very best people they can get their hands on and concentrate their resources heavily into just a few teams, rather than trying to hire everyone with a pulse who can rub two tensors together.
        
        (Very open to correction by people closer to the big scaling labs).
        
        My model, then, says that compute availability is a constraint that binds much harder than programming or research ability, at least as things stand right now.
        
        There was discussion on Dwarkesh Patel’s interview with researcher friends where there was mention that AI reseachers are already restricted by compute granted to them for experiments. Probably also on work hours per week they are allowed to spend on novel “off the main path” research.
        
        Sounds plausible to me. Especially since benchmarks encourage a focus on ability to hit the target at all rather than ability to either succeed or fail cheaply, which is what’s important in domains where the salary / electric bill of the experiment designer is an insignificant fraction of the total cost of the experiment.
        
        But what would that world look like? [...] I agree that that’s a substantial probability, but it’s also an AGI-soon sort of world.
        
        Yeah, I expect it’s a matter of “dumb” scaling plus experimentation rather than any major new insights being needed. If scaling hits a wall that training on generated data + fine tuning + routing + specialization can’t overcome, I do agree that innovation becomes more important than iteration.
        
        My model is not just “AGI-soon” but “the more permissive thresholds for when something should be considered AGI have already been met, and more such thresholds will fall in short order, and so we should stop asking when we will get AGI and start asking about when we will see each of the phenomena that we are using AGI as a proxy for”.
        Nathan Helm-Burger Dec 24, 2024, 9:19 PM
        2 points
        0
        Parent
        I think you’re mostly correct about current AI reseachers being able to usefully experiment with all the compute they have available.
        
        I do think there are some considerations here though.
        
        How closely are they adhering to the “main path” of scaling existing techniques with minor tweaks? If you want to know how a minor tweak affects your current large model at scale, that is a very compute-heavy researcher-time-light type of experiment. On the other hand, if you want to test a lot of novel new paths at much smaller scales, then you are in a relatively compute-light but researcher-time-heavy regime.
        
        What fraction of the available compute resources is the company assigning to each of training/inference/experiments? My guess it that the current split is somewhere around 63/33/4. If this was true, and the company decided to pivot away from training to focus on experiments (0/33/67), this would be something like a 16x increase in compute for experiments. So maybe that changes the bottleneck?
        
        We do indeed seem to be at “AGI for most stuff”, but with a spikey envelope of capability that leaves some dramatic failure modes. So it does make more sense to ask something like, “For remaining specific weakness X, what will the research agenda and timeline look like?”
        
        This makes more sense then continuing to ask the vague “AGI complete” question when we are most of the way there already.
- elifland Dec 24, 2024, 2:58 PM
  10 points
  2
  Parent
  For context in a sibling comment Ryan said and Steven agreed with:
  It sounds like your disagreement isn’t with drawing a link from RE-bench to (forecasts for) automating research engineering, but is instead with thinking that you can get AGI shortly after automating research engineering due to AI R&D acceleration and already being pretty close. Is that right?
  Note that the comment says research engineering, not research scientists.
  Now responding on whether I think the no new paradigms assumption is needed:
  (Obviously you’re entitled to argue / believe that we don’t need need new AI paradigms and concepts to get to AGI! It’s a topic where I think reasonable people disagree. I’m just suggesting that it’s a necessary assumption for your argument to hang together, right?)
  I generally have not been thinking in these sorts of binary terms but instead thinking in terms more like “Algorithmic progress research is moving at pace X today, if we had automated research engineers it would be sped up to N*X.” I’m not necessarily taking a stand on whether the progress will involve new paradigms or not, so I don’t think it requires an assumption of no new paradigms.
  However:
  1. If you think almost all new progress in some important sense will come from paradigm shifts, the forecasting method becomes weaker because the incremental progress doesn’t say as much about progress toward automated research engineering or AGI.
  2. You might think that it’s more confusing than clarifying to think in terms of collapsing all research progress into a single “speed” and forecasting based on that.
  3. Requiring a paradigm shift might lead to placing less weight on lower amounts of research effort required, and even if the probability distribution is the same what we should expect to see in the world leading up to AGI is not.
  I’d also add that:
  1. Regarding what research tasks I’m forecasting for the automated research engineer: REBench is not supposed to fully represent the tasks involved in actual research engineering. That’s why we have the gaps.
  2. Regarding to what extent having an automated research engineer would speed up progress in worlds in which we need a paradigm shift: I think it’s hard to separate out conceptual from engineering/empirical work in terms of progress toward new paradigms. My guess would be being able to implement experiments very cheaply would substantially increase the expected number of paradigm shifts per unit time.
- ryan_greenblatt Dec 23, 2024, 8:58 PM
  9 points
  0
  Parent
  It sounds like your disagreement isn’t with drawing a link from RE-bench to (forecasts for) automating research engineering, but is instead with thinking that you can get AGI shortly after automating research engineering due to AI R&D acceleration and already being pretty close. Is that right?
  
  Note that the comment says research engineering, not research scientists.
  What links here?
  - What are the strongest arguments for very short timelines? by Kaj_Sotala (Dec 23, 2024, 9:38 AM; 101 points)
- Daniel Kokotajlo Dec 24, 2024, 4:03 PM
  7 points
  −1
  Parent
  Thanks for this thoughtful reply!
  
  In the framework of the argument, you seem to be objecting to premises 4-6. Specifically you seem to be saying “There’s another important gap between RE-bench saturation and completely automating AI R&D: new-paradigm-and-concept-generation. Perhaps we can speed up AI R&D by 5x or so without crossing this gap, simply by automating engineering, but to get to AGI we’ll need to cross this gap, and this gap might take a long time to cross even at 5x speed.”
  
  (Is this a fair summary?)
  
  If that’s what you are saying, I think I’d reply:
  
  We already have a list of potential gaps, and this one seems to be a mediocre addition to the list IMO. I don’t think this distinction between old-paradigm/old-concepts and new-paradigm/new-concepts is going to hold up very well to philosophical inspection or continued ML progress; it smells similar to ye olde “do LLMs truly understand, or are they merely stochastic parrots?” and “Can they extrapolate, or do they merely interpolate?”
  
  That said, I do think it’s worthy of being included on the list. I’m just not as excited about it as the other entries, especially (a) and (b).
  
  I’d also say: What makes you think that this gap will take years to cross even at 5x speed? (i.e. even when algorithmic progress is 5x faster than it has been for the past decade) Do you have a positive argument, or is it just generic uncertainty / absence-of-evidence?
  
  (For context: I work in the same org as Eli and basically agree with his argument above)
  - Steven Byrnes Dec 24, 2024, 4:25 PM
    13 points
    2
    Parent
    I think I’m objecting to (as Eli wrote) “collapsing all [AI] research progress into a single “speed” and forecasting based on that”. There can be different types of AI R&D, and we might be able to speed up some types without speeding up other types. For example, coming up with the AlphaGo paradigm (self-play, MCTS, ConvNets, etc.) or LLM paradigm (self-supervised pretraining, Transformers, etc.) is more foundational, whereas efficiently implementing and debugging a plan is less foundational. (Kinda “science vs engineering”?) I also sometimes use the example of Judea Pearl coming up with the belief prop algorithm in 1982. If everyone had tons of compute and automated research engineer assistants, would we have gotten belief prop earlier? I’m skeptical. As far as I understand: Belief prop was not waiting on compute. You can do belief prop on a 1960s mainframe. Heck, you can do belief prop on an abacus. Social scientists have been collecting data since the 1800s, and I imagine that belief prop would have been useful for analyzing at least some of that data, if only someone had invented it.
    - Radford Neal Dec 24, 2024, 8:06 PM
      22 points
      0
      Parent
      Indeed. Not only could belief prop have been invented in 1960, it was invented around 1960 (published 1962, “Low density parity check codes”, IRE Transactions on Information Theory) by Robert Gallager, as a decoding algorithm for error correcting codes.
      I recognized that Gallager’s method was the same as Pearl’s belief propagation in 1996 (MacKay and Neal, ``Near Shannon limit performance of low density parity check codes″, Electronics Letters, vol. 33, pp. 457-458).
      This says something about the ability of AI to potentially speed up research by simply linking known ideas (even if it’s not really AGI).
      - Carl Feynman Dec 24, 2024, 9:56 PM
        13 points
        2
        Parent
        Came here to say this, got beaten to it by Radford Neal himself, wow! Well, I’m gonna comment anyway, even though it’s mostly been said.
        Gallagher proposed belief propagation as an approximate good-enough method of decoding a certain error-correcting code, but didn’t notice that it worked on all sorts of probability problems. Pearl proposed it as a general mechanism for dealing with probability problems, but wanted perfect mathematical correctness, so confined himself to tree-shaped problems. It was their common generalization that was the real breakthrough: an approximate good-enough solution to all sorts of problems. Which is what Pearl eventually noticed, so props to him.
        If we’d had AGI in the 1960s, someone with a probability problem could have said “Here’s my problem. For every paper in the literature, spawn an instance to read that paper and tell me if it has any help for my problem.” It would have found Gallagher’s paper and said “Maybe you could use this?”
    - Steven Byrnes Dec 27, 2024, 1:32 AM
      12 points
      7
      Parent
      I just wanted to add that this hypothesis, i.e.
      I think I’m objecting to (as Eli wrote) “collapsing all [AI] research progress into a single “speed” and forecasting based on that”. There can be different types of AI R&D, and we might be able to speed up some types without speeding up other types.
      …is parallel to what we see in other kinds of automation.
      The technology of today has been much better at automating the production of clocks than the production of haircuts. Thus, 2024 technology is great at automating the production of some physical things but only slightly helpful for automating the production of some other physical things.
      By the same token, different AI R&D projects are trying to “produce” different types of IP. Thus, it’s similarly possible that 2029 AI technology will be great at automating the production of some types of AI-related IP but only slightly helpful for automating the production of some other types of AI-related IP.
    - Carl Feynman Dec 24, 2024, 10:34 PM
      10 points
      0
      Parent
      I disagree that there is a difference of kind between “engineering ingenuity” and “scientific discovery”, at least in the business of AI. The examples you give—self-play, MCTS, ConvNets—were all used in game-playing programs before AlphaGo. The trick of AlphaGo was to combine them, and then discover that it worked astonishingly well. It was very clever and tasteful engineering to combine them, but only a breakthrough in retrospect. And the people that developed them each earlier, for their independent purposes? They were part of the ordinary cycle of engineering development: “Look at a problem, think as hard as you can, come up with something, try it, publish the results.” They’re just the ones you remember, because they were good.
      Paradigm shifts do happen, but I don’t think we need them between here and AGI.
      - Steven Byrnes Dec 25, 2024, 2:30 AM
        9 points
        0
        Parent
        Yeah I’m definitely describing something as a binary when it’s really a spectrum. (I was oversimplifying since I didn’t think it mattered for that particular context.)
        In the context of AI, I don’t know what the difference is (if any) between engineering and science. You’re right that I was off-base there…
        …But I do think that there’s a spectrum from ingenuity / insight to grunt-work.
        So I’m bringing up a possible scenario where near-future AI gets progressively less useful as you move towards the ingenuity side of that spectrum, and where changing that situation (i.e., automating ingenuity) itself requires a lot of ingenuity, posing a chicken-and-egg problem / bottleneck that limits the scope of rapid near-future recursive AI progress.
        Paradigm shifts do happen, but I don’t think we need them between here and AGI.
        Perhaps! Time will tell :)
    - Daniel Kokotajlo Dec 24, 2024, 7:41 PM
      9 points
      2
      Parent
      I certainly agree that the collapse is a lossy abstraction / simplifies; in reality some domains of research will speed up more than 5x and others less than 5x, for example, even if we did get automated research engineers dropped on our heads tomorrow. Are you therefore arguing that in particular, the research needed to get to AGI is of the kind that won’t be sped up significantly? What’s the argument—that we need a new paradigm to get to AIs that can generate new paradigms, and being able to code really fast and well won’t majorly help us think of new paradigms? (I’d disagree with both sub-claims of that claim)
      - Steven Byrnes Dec 24, 2024, 7:53 PM
        9 points
        0
        Parent
        Are you therefore arguing that in particular, the research needed to get to AGI is of the kind that won’t be sped up significantly? What’s the argument—that we need a new paradigm to get to AIs that can generate new paradigms, and being able to code really fast and well won’t majorly help us think of new paradigms? (I’d disagree with both sub-claims of that claim)
        Yup! Although I’d say I’m “bringing up a possibility” rather than “arguing” in this particular thread. And I guess it depends on where we draw the line between “majorly” and “minorly” :)
    - elifland Dec 24, 2024, 7:50 PM
      7 points
      0
      Parent
      This is clarifying for me, appreciate it. If I believed (a) that we needed a paradigm shift like the ones to LLMs in order to get AI systems resulting in substantial AI R&D speedup, and (b) that trend extrapolation from benchmark data would not be informative for predicting these paradigm shifts, then I would agree that the benchmarks + gaps method is not particularly informative.
      Do you think that’s a fair summary of (this particular set of) necessary conditions?
      (edit: didn’t see @Daniel Kokotajlo’s new comment before mine. I agree with him regarding disagreeing with both sub-claims but I think I have a sense of where you’re coming from.)
  - Anthony DiGiovanni Dec 24, 2024, 4:28 PM
    10 points
    6
    Parent
    I don’t think this distinction between old-paradigm/old-concepts and new-paradigm/new-concepts is going to hold up very well to philosophical inspection or continued ML progress; it smells similar to ye olde “do LLMs truly understand, or are they merely stochastic parrots?” and “Can they extrapolate, or do they merely interpolate?”
    I find this kind of pattern-match pretty unconvincing without more object-level explanation. Why exactly do you think this distinction isn’t important? (I’m also not sure “Can they extrapolate, or do they merely interpolate?” qualifies as “ye olde,” still seems like a good question to me at least w.r.t. sufficiently out-of-distribution extrapolation.)
    - Daniel Kokotajlo Dec 24, 2024, 7:48 PM
      10 points
      −1
      Parent
      We are at an impasse then; I think basically I’m just the mirror of you. To me, the burden is on whoever thinks the distinction is important to explain why it matters. Current LLMs do many amazing things that many people—including AI experts—thought LLMs could never do due to architectural limitations. Recent history is full of examples of AI experts saying “LLMs are the offramp to AGI; they cannot do X; we need new paradigm to do X” and then a year or two later LLMs are doing X. So now I’m skeptical and would ask questions like: “Can you say more about this distinction—is it a binary, or a dimension? If it’s a dimension, how can we measure progress along it, and are we sure there hasn’t been significant progress on it already in the last few years, within the current paradigm? If there has indeed been no significant progress (as with ARC-AGI until 2024) is there another explanation for why that might be, besides your favored one (that your distinction is super important and that because of it a new paradigm is needed to get to AGI)”
      - TsviBT Dec 25, 2024, 5:16 AM
        19 points
        4
        Parent
        The burden is on you because you’re saying “we have gone from not having the core algorithms for intelligence in our computers, to yes having them”.
        
        https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce#The__no_blockers__intuition
        
        And I think you’re admitting that your argument is “if we mush all capabilities together into one dimension, AI is moving up on that one dimension, so things will keep going up”.
        
        Would you say the same thing about the invention of search engines? That was a huge jump in the capability of our computers. And it looks even more impressive if you blur out your vision—pretend you don’t know that the text that comes up on your screen is written by a humna, and pretend you don’t know that search is a specific kind of task distinct from a lot of other activity that would be involved in “True Understanding, woooo”—and just say “wow! previously our computers couldn’t write a poem, but now with just a few keystrokes my computer can literally produce Billy Collins level poetry!”.
        
        Blurring things together at that level works for, like, macroeconomic trends. But if you look at macroeconomic trends it doesn’t say singularity in 2 years! Going to 2 or 10 years is an inside-view thing to conclude! You’re making some inference like “there’s an engine that is very likely operating here, that takes us to AGI in xyz years”.
        What links here?
        TsviBT's comment on Views on when AGI comes and on strategy to reduce existential risk by TsviBT (Jan 11, 2025, 10:00 AM; 20 points)
        Daniel Kokotajlo Dec 25, 2024, 1:44 PM
        17 points
        9
        Parent
        I’m not saying that. You are the one who introduced the concept of “the core algorithms for intelligence;” you should explain what that means and why it’s a binary (or if it’s not a binary but rather a dimension, why we haven’t been moving along that dimension in recent past.
        ETA: I do have an ontology, a way of thinking about these things, that is more sophisticated than simply mushing all capabilities together into one dimension. I just don’t accept your ontology yet.
      - Anthony DiGiovanni Dec 25, 2024, 5:43 PM
        6 points
        1
        Parent
        (I might misunderstand you. My impression was that you’re saying it’s valid to extrapolate from “model XYZ does well at RE-Bench” to “model XYZ does well at developing new paradigms and concepts.” But maybe you’re saying that the trend of LLM success at various things suggests we don’t need new paradigms and concepts to get AGI in the first place? My reply below assumes the former:)
        I’m not saying LLMs can’t develop new paradigms and concepts, though. The original claim you were responding to was that success at RE-Bench in particular doesn’t tell us much about success at developing new paradigms and concepts. “LLMs have done various things some people didn’t expect them to be able to do” doesn’t strike me as much of an argument against that.
        More broadly, re: your burden of proof claim, I don’t buy that “LLMs have done various things some people didn’t expect them to be able to do” determinately pins down an extrapolation to “the current paradigm(s) will suffice for AGI, within 2-3 years.” That’s not a privileged reference class forecast, it’s a fairly specific prediction.
        Daniel Kokotajlo Dec 26, 2024, 4:02 PM
        7 points
        0
        Parent
        I feel like this sub-thread is going in circles; perhaps we should go back to the start of it. I said:
        I don’t think this distinction between old-paradigm/old-concepts and new-paradigm/new-concepts is going to hold up very well to philosophical inspection or continued ML progress; it smells similar to ye olde “do LLMs truly understand, or are they merely stochastic parrots?” and “Can they extrapolate, or do they merely interpolate?”
        You replied:
        I find this kind of pattern-match pretty unconvincing without more object-level explanation. Why exactly do you think this distinction isn’t important? (I’m also not sure “Can they extrapolate, or do they merely interpolate?” qualifies as “ye olde,” still seems like a good question to me at least w.r.t. sufficiently out-of-distribution extrapolation.)
        Now, elsewhere in this comment section, various people (Carl, Radford) have jumped in to say the sorts of object-level things I also would have said if I were going to get into it. E.g. that old vs. new paradigm isn’t a binary but a spectrum, that automating research engineering WOULD actually speed up new-paradigm discovery, etc. What do you think of the points they made?
        
        Also, I’m still waiting to hear answers to these questions: “Can you say more about this distinction—is it a binary, or a dimension? If it’s a dimension, how can we measure progress along it, and are we sure there hasn’t been significant progress on it already in the last few years, within the current paradigm? If there has indeed been no significant progress (as with ARC-AGI until 2024) is there another explanation for why that might be, besides your favored one (that your distinction is super important and that because of it a new paradigm is needed to get to AGI)”