Rohin Shah comments on The Field of AI Alignment: A Postmortem, and What To Do About It

Rohin Shah 29 Dec 2024 17:05 UTC
8 points
2
Re: (1), if you look through the thread for the comment of mine that was linked above, I respond to top-down heuristical-constraint-based search as well. I agree the response is different and not just “computational inefficiency”.
Re: (2), I agree that near-future systems will be easily retargetable by just changing the prompt or the evaluator function (this isn’t new to the o-series, you can also “retarget” any LLM chatbot by giving it a different prompt). If this continues to superintelligence, I would summarize it as “it turns out alignment wasn’t a problem” (e.g. scheming never arose, we never had problems with LLMs exploiting systematic mistakes in our supervision, etc). I’d summarize this as “x-risky misalignment just doesn’t happen by default”, which I agree is plausible (see e.g. here), but when I’m talking about the viability of alignment plans like “retarget the search” I generally am assuming that there is some problem to solve.
(Also, random nitpick, who is talking about inference runs of billions of dollars???)
- Thane Ruthenis 29 Dec 2024 17:39 UTC
  4 points
  0
  Parent
  1. Yup, I read through it after writing the previous response and now see that you don’t need to be convinced of that point. Sorry about dragging you into this.
  2. I could nitpick the details here, but I think the discussion has kind of wandered away from any pivotal points of disagreement, plus John didn’t want object-level arguments under this post. So I petition to leave it at that.
  Also, random nitpick, who is talking about inference runs of billions of dollars???
  There’s a log-scaling curve, OpenAI have already spent on the order of a million dollars just to score well on some benchmarks, and people are talking about “how much would you be willing to pay for the proof of the Riemann Hypothesis?”. It seems like a straightforward conclusion that if o-series/inference-time scaling works as well as ML researchers seem to hope, there’d be billion-dollar inference runs funded by some major institutions.
  - Rohin Shah 29 Dec 2024 19:07 UTC
    6 points
    2
    Parent
    OpenAI have already spent on the order of a million dollars just to score well on some benchmarks
    Note this is many different inference runs each of which was thousands of dollars. I agree that people will spend billions of dollars on inference in total (which isn’t specific to the o-series of models). My incredulity was at the idea of spending billions of dollars on a single episode, which is what I thought you were talking about given that you were talking about capability gains from scaling up inference-time compute.