paulfchristiano comments on The Rocket Alignment Problem, Part 2

paulfchristiano 1 May 2023 19:55 UTC
10 points
4
I think Eliezer’s tweet is wrong even if you grant the rocket <> alignment analogy (unless you grant some much more extreme background views about AI alignment).
Assume that “deploy powerful AI with no takeover” is exactly as hard as “build a rocket that flies correctly the first time even though it has 2x more thrust than anything anyone as tested before.” Assume further that an organization is able to do one of those tasks if and only if it can do the other.
Granting the analogy, the relevant question is how much harder it would be to successfully launch and land a rocket the first time without doing tests of any similarly-large rockets. If you tell me it increases costs by 10% I’m like “that’s real but manageable.” If you tell me it doubles the cost, that’s a problem but not anywhere close to doom. If it 10x’s the cost then you’d have to solve a hard political problem.
The fact that SpaceX fails probably tells us that it costs at least 1% or maybe even 10% more to develop starship without ever failing. It doesn’t really tell us much beyond that. I don’t see any indication that they were surprised this failed or that they took significant pains to avoid a failure. The main thing commenters have pushed back on is that this isn’t a mistake in SpaceX’s case, so it’s not helpful evidence about the difficulty of doing something right the first time.
(In fact I’d guess that never doing a test costs much more than 10% extra, but this launch isn’t a meaningful part of the evidence for that.)
Granting the analogy, Eliezer could help himself to a much weaker conclusion:
The fastest and easiest possible way for Elon Musk to build an AI would lead to an AI takeover. He’s not so good at science that “trial by error,” on the actual problem you care about rather than analogies and warmups, doesn’t significantly reduce costs.
But saying “the fastest possible way for Bob to make an AI would lead to an AI takeover” does not imply that “Bob is not qualified to run an AGI company.” Instead it just means that Bob shouldn’t rely on his company doing the fastest and easiest thing and having it turn out fine. Instead Bob should expect to make sacrifices, either burning down a technical lead or operating in (or helping create) a regulatory environment where the fastest and easiest option isn’t allowed.
I suspect what’s really happening is that Eliezer thinks AGI alignment is much harder than successfully launching and landing a rocket the first time. So if getting it right the first time increases costs by 10% for a rocket, it will increase costs by 1,000% for an AGI.
But if that’s the case then the key claim isn’t “solving problems is hard when you can’t iterate.” The key claim is that solving alignment (and learning from safe scientific experiments) is much harder than in other domains, so much harder that a society that can solve alignment will never need to learn from experience for any normal “easy” engineering problem like building rockets. I think that’s conceivable but I’d bet against. Either way, it’s not surprising that people will reject the analogy since it’s based on a strong implicit claim about alignment that most people find outlandish.
- Shmi 1 May 2023 21:51 UTC
  2 points
  0
  Parent
  Assume that “deploy powerful AI with no takeover” is exactly as hard as “build a rocket that flies correctly the first time even though it has 2x more thrust than anything anyone as tested before.”
  I think you are way underestimating. A more reasonable guess is that expected odds of the first Starship launch failure go down logarithmically with budget and time. Even if you grant a linear relationship, reducing the odds of failure from 10% to 1% means 10x the budget and time. If you want to never fail, you need an infinite budget and time. If the failure results in an extinction event, then you are SOL.
  - paulfchristiano 1 May 2023 22:25 UTC
    2 points
    0
    Parent
    A more reasonable guess is that expected odds of the first Starship launch failure go down logarithmically with budget and time.
    That’s like saying that it takes 10 people to get 90% reliability, 100 people to get to 99% reliability, and a hundred million people to get to 99.99% reliability. I don’t think it’s a reasonable model though I’m certainly interested in examples of problems that have worked out that way.
    Linear is a more reasonable best guess. I have quibbles, but I don’t think it’s super relevant to this discussion. I expect the starship first failure probability was >>90%, and we’re talking about the difficulty of getting out of that regime.
    - Shmi 1 May 2023 23:05 UTC
      0 points
      −2
      Parent
      That’s like saying that it takes 10 people to get 90% reliability, 100 people to get to 99% reliability, and a hundred million people to get to 99.99% reliability. I don’t think it’s a reasonable model though I’m certainly interested in examples of problems that have worked out that way.
      Conditional on it being a novel and complicated design. I routinely churn six-sigma code when I know what I am doing, and so do most engineers. But almost never on the first try! The feedback loop is vital, even if it is slow and inefficient. For anything new you are fighting not so much the designs, but human fallibility. Eliezer’s point is that it if you have only one try to succeed, you are hooped. I do not subscribe to the first part, I think we have plenty of opportunities to iterate as LLM capabilities ramp up, but, conditional on “perfect first try or extinction”, our odds of survival are negligible. There might be alignment by default, or some other way out, but conditional on that one assumption, we have no chance in hell.
      It seems to me that you disagree with that point, somehow. That by pouring more resources upfront into something novel, we have good odds of succeeding on the first try, open loop. That is not a tenable assumption, so I assume I misunderstood something.
      - paulfchristiano 2 May 2023 2:59 UTC
        4 points
        0
        Parent
        I agree you need feedback from the world; you need to do experiments. If you wanted to get a 50% chance of launching a rocket successfully on the first time (at any reasonable cost) you would need to do experiments.
        The equivocation between “no opportunity to experiment” and “can’t retry if you fail” is doing all the work in this argument.
- hairyfigment 1 May 2023 20:38 UTC
  −4 points
  −13
  Parent
  >Instead it just means that Bob shouldn’t rely on his company doing the fastest and easiest thing and having it turn out fine. Instead Bob should expect to make sacrifices, either burning down a technical lead or operating in (or helping create) a regulatory environment where the fastest and easiest option isn’t allowed.
  The above feels so bizarre that I wonder if you’re trying to reach Elon Musk personally. If so, just reach out to him. If we assume there’s no self-reference paradox involved, we can safely reject your proposed alternatives as obviously impossible; they would have zero credibility even if AI companies weren’t in an arms race, which appears impossible to stop from the inside unless all the CEOs involved can meet at Bohemian Grove.
  - paulfchristiano 1 May 2023 22:14 UTC
    3 points
    1
    Parent
    There are many industries where it is illegal to do things in the fastest or easiest way. I’m not exactly sure what you are saying here.
    - hairyfigment 1 May 2023 22:26 UTC
      −9 points
      −4
      Parent
      Even focusing on that doesn’t make your claim appear sensible, because such laws will neither happen soon enough, nor in a sufficiently well-aimed fashion, without work from people like the speaker. You also implied twice that tech CEOs would take action on their own—the quote is in the grandparent—and in the parent you act like you didn’t make that bizarre claim.