There’s a lot here, some of it relevant to mechanistic interpretability and some of it not. But addressing your actual specific arguments against mechanistic interpretability (ie this section and the next), I think your arguments here prove far too much.
For example, your reasoning on why mech interp is a non-starter (“what matters here is the effects that the (changing) internals’ interactions with connected surroundings of the environment have”) is true of any essentially computer program with inputs and outputs. Of your specific arguments in the next section, at least arguments 1, 3, 5, 8, 9, and 10 (and arguably 2, 4, 6, and others) apply equally to any computer program.
Since it’s implausible that you’ve proved that no computer program with inputs and outputs can be usefully understood, I think it’s similarly implausible that you’ve proved that no neural network can be usefully understood from its internals.
There’s a lot here, some of it relevant to mechanistic interpretability and some of it not. But addressing your actual specific arguments against mechanistic interpretability (ie this section and the next), I think your arguments here prove far too much.
For example, your reasoning on why mech interp is a non-starter (“what matters here is the effects that the (changing) internals’ interactions with connected surroundings of the environment have”) is true of any essentially computer program with inputs and outputs. Of your specific arguments in the next section, at least arguments 1, 3, 5, 8, 9, and 10 (and arguably 2, 4, 6, and others) apply equally to any computer program.
Since it’s implausible that you’ve proved that no computer program with inputs and outputs can be usefully understood, I think it’s similarly implausible that you’ve proved that no neural network can be usefully understood from its internals.