Ah, but perhaps your objection is that the difficulty of the AI alignment problem suggests that we do in fact need the analog of perfect zero correlation in order to succeed.
My objection is actually mostly to the example itself.
As you mention:
the idea is not to try ang contain a malign AGI which is already not on our side. The plan, to the extent that there is one, is to create systems that are on our side, and apply their optimization pressure to the task of keeping the plan on-course.
Compare with the example:
Suppose we’re designing some secure electronic equipment, and we’re concerned about the system leaking information to adversaries via a radio side-channel.
[...]
But what if we instead design the system so that the leaked radio signal has zero mutual information with whatever signals are passed around inside the system? Then it doesn’t matter how much optimization pressure an adversary applies, they’re not going to figure out anything about those internal signals via leaked radio.
This is analogous to the case of… trying to contain a malign AI which is already not on our side.
Fair enough! I admit that John did not actually provide an argument for why alignment might be achievable by “guessing true names”. I think the approach makes sense, but my argument for why this is the case does differ from John’s arguments here.
My objection is actually mostly to the example itself.
As you mention:
Compare with the example:
This is analogous to the case of… trying to contain a malign AI which is already not on our side.
Fair enough! I admit that John did not actually provide an argument for why alignment might be achievable by “guessing true names”. I think the approach makes sense, but my argument for why this is the case does differ from John’s arguments here.