Thane Ruthenis comments on The Field of AI Alignment: A Postmortem, and What To Do About It

Thane Ruthenis 27 Dec 2024 16:54 UTC
9 points
0
Rohin Shah has already explained the basic reasons why I believe the mesa-optimizer-type search probably won’t exist/be findable in the inner workings of the models we encounter: “Search is computationally inefficient relative to heuristics, and we’ll be selecting really hard on computational efficiency.”
I think this statement is quite ironic in retrospect, given how OpenAI’s o-series seems to work (at train-time and at inference-time both), and how much AI researchers hype it up.
By contrast, my understanding is that the sort of search John is talking about retargeting isn’t the brute-force babble-and-prune algorithms, but a top-down heuristical-constraint-based search.
So it is in fact the ML researchers now who believe in the superiority of the computationally inefficient search; not the agency theorists.
- Rohin Shah 29 Dec 2024 16:17 UTC
  11 points
  2
  Parent
  I think this statement is quite ironic in retrospect, given how OpenAI’s o-series seems to work
  I stand by my statement and don’t think anything about the o-series model invalidates it.
  And to be clear, I’ve expected for many years that early powerful AIs will be expensive to run, and have critiqued people for analyses that implicitly assumed or implied that the first powerful AIs will be cheap, prior to the o-series being released. (Though unfortunately for the two posts I’m thinking of, I made the critiques privately.)
  There’s a world of difference between “you can get better results by thinking longer” (yeah, obviously this was going to happen) and “the AI system is a mesa optimizer in the strong sense that it has an explicitly represented goal such that you can retarget the search” (I seriously doubt it for the first transformative AIs, and am uncertain for post-singularity superintelligence).
  - Thane Ruthenis 29 Dec 2024 16:42 UTC
    4 points
    0
    Parent
    To lay out my arguments properly:
    “Search is ruinously computationally inefficient” does not work as a counter-argument against the retargetability of search, because the inefficiency argument applies to babble-and-prune search, not to the top-down heuristical-constraint-based search that was/is being discussed.
    There are valid arguments against easily-retargetable heuristics-based search as well (I do expect many learned ML algorithms to be much messier than that). But this isn’t one of them.
    ML researchers are currently incredibly excited about the inference-time scaling laws, talking about inference runs costing millions/billions of dollars, and how much capability will be unlocked this way.
    The o-series paradigm would use this compute to, essentially, perform babble-and-prune search. The pruning would seem to be done by some easily-swappable evaluator (either the system’s own judgement based on the target specified in a prompt, or an external theorem-prover, etc.).
    If things will indeed go this way, then it would seem that a massive amount of capabilities will be based on highly inefficient babble-and-prune search, and that this search would be easily retargetable by intervening on one compact element of the system (the prompt, or the evaluator function).
    - Rohin Shah 29 Dec 2024 17:05 UTC
      8 points
      2
      Parent
      Re: (1), if you look through the thread for the comment of mine that was linked above, I respond to top-down heuristical-constraint-based search as well. I agree the response is different and not just “computational inefficiency”.
      Re: (2), I agree that near-future systems will be easily retargetable by just changing the prompt or the evaluator function (this isn’t new to the o-series, you can also “retarget” any LLM chatbot by giving it a different prompt). If this continues to superintelligence, I would summarize it as “it turns out alignment wasn’t a problem” (e.g. scheming never arose, we never had problems with LLMs exploiting systematic mistakes in our supervision, etc). I’d summarize this as “x-risky misalignment just doesn’t happen by default”, which I agree is plausible (see e.g. here), but when I’m talking about the viability of alignment plans like “retarget the search” I generally am assuming that there is some problem to solve.
      (Also, random nitpick, who is talking about inference runs of billions of dollars???)
      - Thane Ruthenis 29 Dec 2024 17:39 UTC
        4 points
        0
        Parent
        Yup, I read through it after writing the previous response and now see that you don’t need to be convinced of that point. Sorry about dragging you into this.
        I could nitpick the details here, but I think the discussion has kind of wandered away from any pivotal points of disagreement, plus John didn’t want object-level arguments under this post. So I petition to leave it at that.
        Also, random nitpick, who is talking about inference runs of billions of dollars???
        There’s a log-scaling curve, OpenAI have already spent on the order of a million dollars just to score well on some benchmarks, and people are talking about “how much would you be willing to pay for the proof of the Riemann Hypothesis?”. It seems like a straightforward conclusion that if o-series/inference-time scaling works as well as ML researchers seem to hope, there’d be billion-dollar inference runs funded by some major institutions.
        Rohin Shah 29 Dec 2024 19:07 UTC
        6 points
        2
        Parent
        OpenAI have already spent on the order of a million dollars just to score well on some benchmarks
        Note this is many different inference runs each of which was thousands of dollars. I agree that people will spend billions of dollars on inference in total (which isn’t specific to the o-series of models). My incredulity was at the idea of spending billions of dollars on a single episode, which is what I thought you were talking about given that you were talking about capability gains from scaling up inference-time compute.
- Noosphere89 27 Dec 2024 17:27 UTC
  11 points
  3
  Parent
  Re the OpenAI o-series and search, my initial prediction is that Q*/MCTS search will work well on problems that are easy to verify and and easy to get training data for, and not work if either of these 2 conditions are violated, and secondarily will be reliant on the model having good error correction capabilities to use the search effectively, which is why I expect we can make RL capable of superhuman performance on mathematics/programming with some rather moderate schlep/drudge work, and I also expect cost reductions such that it can actually be practical, but I’m only giving a ⁵⁰⁄₅₀ chance by 2028 for superhuman performance as measured by benchmarks in these domains.
  I think my main difference from you, Thane Ruthenis is I expect costs to reduce surprisingly rapidly, though this is admittedly untested.
  This will accelerate AI progress, but not immediately cause an AI explosion, though in the more extreme paces this could create something like a scenario where programming companies are founded by a few people smartly managing a lot of programming AIs, and programming/mathematics experiencing something like what happened to the news industry from the rise of the internet, where there was a lot of bankruptcy of the middle end, the top end won big, and most people are in the bottom end.
  Also, correct point on how a lot of people’s conceptions of search are babble-and-prune, not top down search like MCTS/Q*/BFS/DFS/A* (not specifically targeted at sunwillrisee
  By contrast, my understanding is that the sort of search John is talking about retargeting isn’t the brute-force babble-and-prune algorithms, but a top-down heuristical-constraint-based search.
  - Thane Ruthenis 27 Dec 2024 17:53 UTC
    9 points
    0
    Parent
    I’m not strongly committed to the view that the costs won’t rapidly reduce: I can certainly see the worlds in which it’s possible to efficiently distill trees-of-thought unrolls into single chains of thoughts. Perhaps it scales iteratively, where we train a ML model to handle the next layer of complexity by generating big ToTs, distilling them into CoTs, then generating the next layer of ToTs using these more-competent CoTs, etc.
    Or perhaps distillation doesn’t work that well, and the training/inference costs grow exponentially (combinatorially?).
    - Noosphere89 27 Dec 2024 18:07 UTC
      2 points
      2
      Parent
      Yeah, we will have to wait at least several years.
      
      One confound in all of this is that big talent is moving out of OpenAI, which means I’m more bearish on the company’s future prospects specifically without it being that much of a detriment towards progress towards AGI.