Your distinction between “outer alignment” and “inner alignment” is both ahistorical and unYudkowskian. It was invented years after this post was written, by someone who wasn’t me; and though I’ve sometimes used the terms in occasions where they seem to fit unambiguously, it’s not something I see as a clear ontological division, especially if you’re talking about questions like “If we own the following kind of blackbox, would alignment get any easier?” which on my view breaks that ontology. So I strongly reject your frame that this post was “clearly portraying an outer alignment problem” and can be criticized on those grounds by you; that is anachronistic.
You are now dragging in a very large number of further inferences about “what I meant”, and other implications that you think this post has, which are about Christiano-style proposals that were developed years after this post. I have disagreements with those, many disagreements. But it is definitely not what this post is about, one way or another, because this post predates Christiano being on the scene.
What this post is trying to illustrate is that if you try putting crisp physical predicates on reality, that won’t work to say what you want. This point is true! If you then want to take in a bunch of anachronistic ideas developed later, and claim (wrongly imo) that this renders irrelevant the simple truth of what this post actually literally says, that would be a separate conversation. But if you’re doing that, please distinguish the truth of what this post actually says versus how you think these other later clever ideas evade or bypass that truth.
Well-checked.