It seems like in practice this kind of internal algorithmic inequivalence is always detectable. That is, you could always figure out for a given human just by feeding the black box different inputs and outputs which of the four possibilities is occurring, and that they would diverge in meaningful behavioral ways in the appropriate circumstances.
It also seems like the reason I have intuitions that those four cases are “different” is actually because I expect them to result in different outside-black-box behaviors when you vary inputs. Is there a concrete example where internal structural difference cannot be detected outside the box? It’s not obvious to me that I would care about such a difference.
It’s detectable because the algorithms are clean and simple as laid out here. Make it a bit more messy, add a few almost-irrelevant cross connections, and it becomes a lot harder.
In theory, of course, you could run an entire world self-contained inside an algorithm, and algorithmic equivalence would argue that it is therefore irrelevant.
And in practice, what I’m aiming for is use “human behaviour + brain structure + FMRI outputs” to get more than just “human behaviour”. It might be that those are equivalent in the limit of a super AI that can analyses every counterfactual universe, yet different in practice for real AIs.
It seems like in practice this kind of internal algorithmic inequivalence is always detectable. That is, you could always figure out for a given human just by feeding the black box different inputs and outputs which of the four possibilities is occurring, and that they would diverge in meaningful behavioral ways in the appropriate circumstances.
It also seems like the reason I have intuitions that those four cases are “different” is actually because I expect them to result in different outside-black-box behaviors when you vary inputs. Is there a concrete example where internal structural difference cannot be detected outside the box? It’s not obvious to me that I would care about such a difference.
It’s detectable because the algorithms are clean and simple as laid out here. Make it a bit more messy, add a few almost-irrelevant cross connections, and it becomes a lot harder.
In theory, of course, you could run an entire world self-contained inside an algorithm, and algorithmic equivalence would argue that it is therefore irrelevant.
And in practice, what I’m aiming for is use “human behaviour + brain structure + FMRI outputs” to get more than just “human behaviour”. It might be that those are equivalent in the limit of a super AI that can analyses every counterfactual universe, yet different in practice for real AIs.