I don’t think the analogy to biological brains is quite as strong. For example, biological brains need to be “robust” not only to variations in the input, but also in a literal sense, to forceful impact or to parasites trying to control it. It intentionally has very bad suppressability, and this means there needs to be a lot of redundancy, which makes “just stick an electrode in that area” work. More generally, it is under many constraints that a ML system isn’t, probably too many for us to think of, and it generally prioritizes safety over performance. Both lead away from the sort of maximally efficient compression that makes ML systems hard to interpret.
Analogously: Imagine a programmer would write the shortest program that does a given task. That would be terrible. It would be impossible to change anything without essentially redesigning everything, and trying to understand what it does just from reading the code would be very hard, and giving a compressed explanation of how it does that would be impossible. In practice, we don’t write code like that, because we face constraints like those mentioned above—but its very easy to imagine that some optimization-based “automatic coder” would program like that. Indeed, on the occasion that we need to really optimize runtimes, we move in that direction ourselves.
So I don’t think brains tell us much about the interpretability of the standard, highly optimized neural nets.
I think your assessments of whats psychologically realistic are off.
I think before writing that, Yud imagined calling [unambiguously gendered friend] either pronoun, and asked himself if it felt wrong, and found that it didn’t. This seems realistic to me: I’ve experienced my emotional introspection becoming blank on topics I’ve put a lot of thinking into. This doesn’t prevent doing the same automatic actions you always did, or knowing what those would be in a given situation. If something like this happened to him for gender long enough ago, he may well not be able to imagine otherwise.
It’s unreasonable, but it seems totally plausible that on one occasion you would feel like you know someone has a certain name, and continue feeling that way even after being rationally convinced you’re wrong. That there are many names only means that the odds of any particular name featuring in such a situation is low, not that the class as a whole has low odds, and I don’t see why the prior for that would be lower than for e.g. mistaken deja vu experiences.