Anon9579294728 comments on Idea: NV⁻ Centers for Brain Interpretability

Anon9579294728 18 Feb 2024 15:25 UTC
4 points
−2
I am not remotely qualified to understand whether or not the technical claims about NV centers are correct.

On the margins (given what we understand about neurology today), it is probably true that brain interpretability research will have positive effects. I think these positive effects are second or third order compared to a much scarier first order trend of (a nontrivial amount of probability mass on) vastly negative effects in the longer run.

I think I would be willing to trade away the medical advances we might get from interpretability of small brain circuits in order to shift probability mass away from a future where the state data of a live human brain can be imaged, tinkered with during runtime, stopped, edited, stepped forward in a small loop a million times over with minor perturbations, and analyzed for artifacts. There are almost certainly input signals to the brain that are every bit as weird as “ glitch tokens” just waiting to be found.

Again, far off in the future, but I worry “we need better brain interpretability” elicits the same gut feeling from me as: Scientists have tried for years to create the “horror sprocket,” the first of seventeen important technical breakthroughs depicted in the classic sci-fi novel “Don’t create the torment nexus.” Fortunately, recent advances in sprocket engineering have resolved some of the key real world obstacles!

I’m having a hard time exactly articulating why the term “brain interpretability” terrifies me. I think it is something like this. I believe I am the information processing routine running in my brain, and I believe that a version of the routine implemented on different hardware is still “me,” or at least a “me.” To the extent any value is sacred to me, I think “I am the only entity that should have anything resembling root access to this information processing routine” is one. Anyone will full access to the connectome and set of weights (and whatever other parameters may be needed) has (the ability to obtain, with sufficient intelligence and computing power) root access. Any technology that makes it easier to obtain and understand the weights is, in expectation, horrifying to me.

As a side remark, I thought Richard Ngo’s offhanded mention of interpretability in “Masterpiece” was probably the (at least for me) most upsetting addition to the excellent qntm short story “Don’t create the torment nexus”—https://qntm.org/lena