I don’t think surviving worlds have a plan in the sense Eliezer is looking for.
This seems wrong to me, could you elaborate? Prompt: Presumably you think we do have a plan, it just doesn’t meet Eliezer’s standards. What is that plan?
Eliezer said:
Surviving worlds, by this point, and in fact several decades earlier, have a plan for how to survive. It is a written plan. The plan is not secret. In this non-surviving world, there are no candidate plans that do not immediately fall to Eliezer instantly pointing at the giant visible gaping holes in that plan.
… Key people are taking internal and real responsibility for finding flaws in their own plans, instead of considering it their job to propose solutions and somebody else’s job to prove those solutions wrong. That world started trying to solve their important lethal problems earlier than this. Half the people going into string theory shifted into AI alignment instead and made real progress there. When people suggest a planetarily-lethal problem that might materialize later—there’s a lot of people suggesting those, in the worlds destined to live, and they don’t have a special status in the field, it’s just what normal geniuses there do—they’re met with either solution plans or a reason why that shouldn’t happen, not an uncomfortable shrug and ‘How can you be sure that will happen’ / ‘There’s no way you could be sure of that now, we’ll have to wait on experimental evidence.’
I’m guessing the disagreement is that Yudkowsky thinks the holes are giant visible and gaping, whereas you think they are indeed holes but you have some ideas for how to fix them and at any rate the plan is to work on fixing those holes and to not deploy powerful AGI until those holes are fixed. I’m guessing you agree that it’s bad to meet suggestions for lethal problems with ’”how can you be sure / we’ll have to wait / shrug” and that instead it’s good for people to start thinking about those problems and designing solutions now.
I guess there’s also the “It is a written plan. It is not secret” part. I for one would feel noticeably better if we had a written, non-secret plan.
I think most worlds, surviving or not, don’t have a plan in the sense that Eliezer is asking about.
I do agree that in the best worlds, there are quite a lot of very good plans and extensive analysis of how they would play out (even if it’s not the biggest input into decision-making). Indeed, I think there are a lot of things that the best possible world would be doing that we aren’t, and I’d give that world a very low probability of doom even if alignment was literally impossible-in-principle.
I think it’s less about how many holes there are in a given plan, and more like “how much detail does it need before it counts as a plan?” If someone says that their plan is “Keep doing alignment research until the problem is solved”, then whether or not there’s a hole in that plan is downstream of all the other disagreements about how easy the alignment problem is. But it seems like, separate from the other disagreements, Eliezer tends to think that having detailed plans is very useful for making progress.
Analogy for why I don’t buy this: I don’t think that the Wright brothers’ plan to solve the flying problem would count as a “plan” by Eliezer’s standards. But it did work.
As far as I understand, Eliezer doesn’t claim that plans are generally very useful for making progress in solving problems. Trial and error usually works very well. But he also says that trial and error will not work for the alignment problem; we have to get it right the first time, therefore detailed plans are our only hope. This isn’t a overconfidence in plans, it is just a high confidence that the usual trial and error approach can’t be used this time.
I’m guessing the disagreement is that Yudkowsky thinks the holes are giant visible and gaping, whereas you think they are indeed holes but you have some ideas for how to fix them
I think we don’t know whether various obvious-to-us-now things will work with effort. I think we don’t really have a plan that would work with an acceptably high probability and stand up to scrutiny / mildly pessimistic assumptions.
I would guess that if alignment is hard, then whatever we do ultimately won’t follow any existing plan very closely (whether we succeed or not). I do think it’s reasonably likely to agree at a very high level. I think that’s also true even in the much better worlds that do have tons of plans.
at any rate the plan is to work on fixing those holes and to not deploy powerful AGI until those holes are fixed
I wouldn’t say there is “a plan” to do that.
Many people have that hope, and have thought some about how we might establish sufficient consensus about risk to delay AGI deployment for 0.5-2 years if things look risky, and how to overcome various difficulties with implementing that kind of delay, or what kind of more difficult moves might be able to delay significantly longer than that.
This seems wrong to me, could you elaborate? Prompt: Presumably you think we do have a plan, it just doesn’t meet Eliezer’s standards. What is that plan?
Eliezer said:
I’m guessing the disagreement is that Yudkowsky thinks the holes are giant visible and gaping, whereas you think they are indeed holes but you have some ideas for how to fix them and at any rate the plan is to work on fixing those holes and to not deploy powerful AGI until those holes are fixed. I’m guessing you agree that it’s bad to meet suggestions for lethal problems with ’”how can you be sure / we’ll have to wait / shrug” and that instead it’s good for people to start thinking about those problems and designing solutions now.
I guess there’s also the “It is a written plan. It is not secret” part. I for one would feel noticeably better if we had a written, non-secret plan.
I think most worlds, surviving or not, don’t have a plan in the sense that Eliezer is asking about.
I do agree that in the best worlds, there are quite a lot of very good plans and extensive analysis of how they would play out (even if it’s not the biggest input into decision-making). Indeed, I think there are a lot of things that the best possible world would be doing that we aren’t, and I’d give that world a very low probability of doom even if alignment was literally impossible-in-principle.
ETA: this is closely related to Richard’s point in the sibling.
I think it’s less about how many holes there are in a given plan, and more like “how much detail does it need before it counts as a plan?” If someone says that their plan is “Keep doing alignment research until the problem is solved”, then whether or not there’s a hole in that plan is downstream of all the other disagreements about how easy the alignment problem is. But it seems like, separate from the other disagreements, Eliezer tends to think that having detailed plans is very useful for making progress.
Analogy for why I don’t buy this: I don’t think that the Wright brothers’ plan to solve the flying problem would count as a “plan” by Eliezer’s standards. But it did work.
As far as I understand, Eliezer doesn’t claim that plans are generally very useful for making progress in solving problems. Trial and error usually works very well. But he also says that trial and error will not work for the alignment problem; we have to get it right the first time, therefore detailed plans are our only hope. This isn’t a overconfidence in plans, it is just a high confidence that the usual trial and error approach can’t be used this time.
I think we don’t know whether various obvious-to-us-now things will work with effort. I think we don’t really have a plan that would work with an acceptably high probability and stand up to scrutiny / mildly pessimistic assumptions.
I would guess that if alignment is hard, then whatever we do ultimately won’t follow any existing plan very closely (whether we succeed or not). I do think it’s reasonably likely to agree at a very high level. I think that’s also true even in the much better worlds that do have tons of plans.
I wouldn’t say there is “a plan” to do that.
Many people have that hope, and have thought some about how we might establish sufficient consensus about risk to delay AGI deployment for 0.5-2 years if things look risky, and how to overcome various difficulties with implementing that kind of delay, or what kind of more difficult moves might be able to delay significantly longer than that.