“I have to be wrong about something, which I certainly am. I have to be wrong about something which makes the problem easier rather than harder, for those people who don’t think alignment’s going to be all that hard. If you’re building a rocket for the first time ever, and you’re wrong about something, it’s not surprising if you’re wrong about something. It’s surprising if the thing that you’re wrong about causes the rocket to go twice as high, on half the fuel you thought was required and be much easier to steer than you were afraid of.”
I agree with OP that this rocket analogy from Eliezer is a bad analogy, AFAICT. If someone is trying to assess the difficulty of solving a technical problem (e.g. building a rocket) in advance, then they need to brainstorm potential problems that might come up, and when they notice one, they also need to brainstorm potential technical solutions to that problem. For example “the heat of reentry will destroy the ship” is a potential problem, and “we can invent new and better heat-resistant tiles / shielding” is a potential solution to that problem. During this process, I don’t think it’s particularly unusual for the person to notice a technical problem but overlook a clever way to solve that problem. (Maybe they didn’t recognize the possibility of inventing new super-duper-heat-resistant ceramic tiles, or whatever.) And then they would wind up overly pessimistic.
During this process, I don’t think it’s particularly unusual for the person to notice a technical problem but overlook a clever way to solve that problem.
I think this isn’t the claim; I think the claim is that it would be particularly unusual for someone to overlook that they’re accidentally solving a technical problem. (It would be surprising for Edison to not be thinking hard about what filament to use and pick tungsten; in actual history, it took decades for that change to be made.)
Sure, but then the other side of the analogy doesn’t make sense, right? The context was: Eliezer was talking in general terms about the difficulty of the AGI x-risk problem and whether it’s likely to be solved. (As I understand it.)
[Needless to say, I’m just making a narrow point that it’s a bad analogy. I’m not arguing that p(doom) is high or low, I’m not saying this is an important & illustrative mistake (talking on the fly is hard!), etc.]
So I definitely think that’s something weirdly unspoken about the argument; I would characterize it as Eliezer saying “suppose I’m right and they’re wrong; all this requires is things to be harder than people think, which is usual. Suppose instead that I’m wrong and they’re right; this requires things to be easier than people think, which is unusual.” But the equation of “people” and “Eliezer” is sort of strange; as Quintin notes, it isn’t that unusual for outside observers to overestimate difficulty, and so I wish he had centrally addressed the the reference class tennis game; is the expertise “getting AI systems to be capable” or “getting AI systems to do what you want”?
(Maybe they didn’t recognize the possibility of inventing new super-duper-heat-resistant ceramic tiles, or whatever.) And then they would wind up overly pessimistic.
Basically, this is what I think happened to AI alignment, just replace ridiculously good heat resistant tiles with Pretraining from Human Feedback and the analogy works here.
It wasn’t inevitable or even super likely that this would happen, or that we could have an alignment goal that gets better with capabilities by default, but we found one, and this makes me way more optimistic on alignment than I used to be.
I agree with OP that this rocket analogy from Eliezer is a bad analogy, AFAICT. If someone is trying to assess the difficulty of solving a technical problem (e.g. building a rocket) in advance, then they need to brainstorm potential problems that might come up, and when they notice one, they also need to brainstorm potential technical solutions to that problem. For example “the heat of reentry will destroy the ship” is a potential problem, and “we can invent new and better heat-resistant tiles / shielding” is a potential solution to that problem. During this process, I don’t think it’s particularly unusual for the person to notice a technical problem but overlook a clever way to solve that problem. (Maybe they didn’t recognize the possibility of inventing new super-duper-heat-resistant ceramic tiles, or whatever.) And then they would wind up overly pessimistic.
I think this isn’t the claim; I think the claim is that it would be particularly unusual for someone to overlook that they’re accidentally solving a technical problem. (It would be surprising for Edison to not be thinking hard about what filament to use and pick tungsten; in actual history, it took decades for that change to be made.)
Sure, but then the other side of the analogy doesn’t make sense, right? The context was: Eliezer was talking in general terms about the difficulty of the AGI x-risk problem and whether it’s likely to be solved. (As I understand it.)
[Needless to say, I’m just making a narrow point that it’s a bad analogy. I’m not arguing that p(doom) is high or low, I’m not saying this is an important & illustrative mistake (talking on the fly is hard!), etc.]
So I definitely think that’s something weirdly unspoken about the argument; I would characterize it as Eliezer saying “suppose I’m right and they’re wrong; all this requires is things to be harder than people think, which is usual. Suppose instead that I’m wrong and they’re right; this requires things to be easier than people think, which is unusual.” But the equation of “people” and “Eliezer” is sort of strange; as Quintin notes, it isn’t that unusual for outside observers to overestimate difficulty, and so I wish he had centrally addressed the the reference class tennis game; is the expertise “getting AI systems to be capable” or “getting AI systems to do what you want”?
Basically, this is what I think happened to AI alignment, just replace ridiculously good heat resistant tiles with Pretraining from Human Feedback and the analogy works here.
It wasn’t inevitable or even super likely that this would happen, or that we could have an alignment goal that gets better with capabilities by default, but we found one, and this makes me way more optimistic on alignment than I used to be.
I disagree but won’t argue here. IMO it’s off-topic.