dr_s comments on The Rocket Alignment Problem, Part 2

dr_s 2 May 2023 9:13 UTC
1 point
0

The basic Umeshism: if you’re not failing sometimes, you’re not trying hard enough problems.

Well, or you’re trying problems that you can’t afford to fail at. If a trapeze artist doesn’t fall off 50% of his no-net performances, should they try a harder performance?
- evand 3 May 2023 1:13 UTC
  2 points
  0
  Parent
  That’s the point. SpaceX can afford to fail at this; the decision makers know it. Eliezer can afford to fail at tweet writing and knows it. So they naturally ratchet up the difficulty of the problem until they’re working on problems that maximize their expected return (in utility, not necessarily dollars). At least approximately. And then fail sometimes.
  Or, for the trapeze artist… how long do they keep practicing? Do they do the no-net route when they estimate their odds of failure are 1/100? ¹⁄₁₀,000? 1e-6? They don’t push them to zero, at some point they make a call and accept the risk and go.
  Why should it be any different for an entity that can one-shot those problems? Why would they wait until they had invested enough effort to one-shot it, and then do so? When instead they could just… invest less effort, attempt it earlier, take some risk of failure, and reap a greater expected reward?
  The analogy suggests that entities capable of one-shotting problem X (presumably, by putting in a lot of preparatory effort, running analysis, and so on) will do so. I don’t think that’s true.
  (And I think the tweet writing problem is actually an especially strong example of this—hypercompetitive social environments absolutely produce problems calibrated to be barely-solvable and that scale with ability, assuming your capability is in line with the other participants, which I assert is the case for Eliezer. he might be smarter / better at writing tweets than most, but he’s not that far ahead.)
  - dr_s 3 May 2023 8:40 UTC
    1 point
    0
    Parent
    
    SpaceX can afford to fail at this; the decision makers know it.
    
    Well, to be fair, the post is making the point that perhaps they can afford less than they thought. They completely ignored the effects their failure would have on the surrounding communities (which reeks highly of conceit on their part) and now they’re paying the price with the risk of a disproportionate crackdown. It’ll cost them more than they expected for sure.
    
    The analogy suggests that entities capable of one-shotting problem X (presumably, by putting in a lot of preparatory effort, running analysis, and so on) will do so. I don’t think that’s true.
    
    You’re right, but the analogy is also saying I think that if we were capable enough to one-shot AGI (which according to EY we need to), then we surely would be capable enough to also very cheaply one-shot a Starship launch, because it’s a simpler problem. Failure may be a good teacher, but it’s not a free one. If you’re competent enough to one-shot things with only a tiny bit of additional effort, you do it. Having this failure rate instead shows that you’re already straining yourself at the very limit of what’s possible, and the very limit is apparently… launching big rockets. Which while awesome in a general sense is really, really child’s play compared to getting superhuman AGI right, and on that estimate I do agree with Yud.
    
    I would add that a huge part of solving alignment requires being keenly aware of and caring about human values in general, and in that sense, the sort of mindset that leads to not foreseeing or giving a damn about how pissed off people would be by clouds of launchpad dust in their towns really isn’t the culture you want to bring into AGI creation.