That is, your frame here is something like “planning is hard therefore you should distrust alignment plans”.
But you could just as easily frame this as “abstract reasoning about unfamiliar domains is hard therefore you should distrust doom arguments”.
I think received wisdom in cryptography is “don’t roll your own crypto system”. I think this comes from a bunch of overconfident people doing this and then other people discovering major flaws in what they did, repeatedly.
The lesson is not “Reasoning about a crypto system you haven’t built yet is hard, and therefore it’s equally reasonable to say ‘a new system will work well’ and ‘a new system will work badly’.” Instead, it’s “Your new system will probably work badly.”
I think the underlying model is that there are lots of different ways for your new crypto system to be flawed, and you have to get all of them right, or else the optimizing intelligence of your rivals (ideally) or the malware industry (less ideally) will find the security hole and exploit it. If there are ten things you need to get right, and you have a 30% chance of screwing up each one, then the chance of complete success is 2.8%. Therefore, one could say that, if there’s a general fog of “It’s hard to reason about these things before building them, such that it’s hard to say in advance that the chance of failure in each thing is below 30%”, then that points asymmetrically towards overall failure.
I think Raemon’s model (pretty certainly Eliezer’s) is indeed that an alignment plan is, in large parts, like a security system in that there are lots of potential weaknesses any one of which could torpedo the whole system, and those weaknesses will be sought out by an optimizing intelligence. Perhaps your model is different?
I think received wisdom in cryptography is “don’t roll your own crypto system”. I think this comes from a bunch of overconfident people doing this and then other people discovering major flaws in what they did, repeatedly.
The lesson is not “Reasoning about a crypto system you haven’t built yet is hard, and therefore it’s equally reasonable to say ‘a new system will work well’ and ‘a new system will work badly’.” Instead, it’s “Your new system will probably work badly.”
I think the underlying model is that there are lots of different ways for your new crypto system to be flawed, and you have to get all of them right, or else the optimizing intelligence of your rivals (ideally) or the malware industry (less ideally) will find the security hole and exploit it. If there are ten things you need to get right, and you have a 30% chance of screwing up each one, then the chance of complete success is 2.8%. Therefore, one could say that, if there’s a general fog of “It’s hard to reason about these things before building them, such that it’s hard to say in advance that the chance of failure in each thing is below 30%”, then that points asymmetrically towards overall failure.
I think Raemon’s model (pretty certainly Eliezer’s) is indeed that an alignment plan is, in large parts, like a security system in that there are lots of potential weaknesses any one of which could torpedo the whole system, and those weaknesses will be sought out by an optimizing intelligence. Perhaps your model is different?