Thane Ruthenis comments on The Field of AI Alignment: A Postmortem, and What To Do About It

Thane Ruthenis 29 Dec 2024 16:42 UTC
4 points
0
To lay out my arguments properly:
1. “Search is ruinously computationally inefficient” does not work as a counter-argument against the retargetability of search, because the inefficiency argument applies to babble-and-prune search, not to the top-down heuristical-constraint-based search that was/is being discussed.
  There are valid arguments against easily-retargetable heuristics-based search as well (I do expect many learned ML algorithms to be much messier than that). But this isn’t one of them.
2. ML researchers are currently incredibly excited about the inference-time scaling laws, talking about inference runs costing millions/billions of dollars, and how much capability will be unlocked this way.
  The o-series paradigm would use this compute to, essentially, perform babble-and-prune search. The pruning would seem to be done by some easily-swappable evaluator (either the system’s own judgement based on the target specified in a prompt, or an external theorem-prover, etc.).
  If things will indeed go this way, then it would seem that a massive amount of capabilities will be based on highly inefficient babble-and-prune search, and that this search would be easily retargetable by intervening on one compact element of the system (the prompt, or the evaluator function).
- Rohin Shah 29 Dec 2024 17:05 UTC
  8 points
  2
  Parent
  Re: (1), if you look through the thread for the comment of mine that was linked above, I respond to top-down heuristical-constraint-based search as well. I agree the response is different and not just “computational inefficiency”.
  Re: (2), I agree that near-future systems will be easily retargetable by just changing the prompt or the evaluator function (this isn’t new to the o-series, you can also “retarget” any LLM chatbot by giving it a different prompt). If this continues to superintelligence, I would summarize it as “it turns out alignment wasn’t a problem” (e.g. scheming never arose, we never had problems with LLMs exploiting systematic mistakes in our supervision, etc). I’d summarize this as “x-risky misalignment just doesn’t happen by default”, which I agree is plausible (see e.g. here), but when I’m talking about the viability of alignment plans like “retarget the search” I generally am assuming that there is some problem to solve.
  (Also, random nitpick, who is talking about inference runs of billions of dollars???)
  - Thane Ruthenis 29 Dec 2024 17:39 UTC
    4 points
    0
    Parent
    Yup, I read through it after writing the previous response and now see that you don’t need to be convinced of that point. Sorry about dragging you into this.
    I could nitpick the details here, but I think the discussion has kind of wandered away from any pivotal points of disagreement, plus John didn’t want object-level arguments under this post. So I petition to leave it at that.
    Also, random nitpick, who is talking about inference runs of billions of dollars???
    There’s a log-scaling curve, OpenAI have already spent on the order of a million dollars just to score well on some benchmarks, and people are talking about “how much would you be willing to pay for the proof of the Riemann Hypothesis?”. It seems like a straightforward conclusion that if o-series/inference-time scaling works as well as ML researchers seem to hope, there’d be billion-dollar inference runs funded by some major institutions.
    - Rohin Shah 29 Dec 2024 19:07 UTC
      6 points
      2
      Parent
      OpenAI have already spent on the order of a million dollars just to score well on some benchmarks
      Note this is many different inference runs each of which was thousands of dollars. I agree that people will spend billions of dollars on inference in total (which isn’t specific to the o-series of models). My incredulity was at the idea of spending billions of dollars on a single episode, which is what I thought you were talking about given that you were talking about capability gains from scaling up inference-time compute.