Tensor White comments on Tensor White’s Shortform

Tensor White 10 Jul 2023 17:29 UTC
−7 points
−16
Debunking AI x-risk.

Suppose you gave an NN access to its own NN. It would have read and write access over every neuron and every connection. Such a trivial “self-learning” system would quickly change something that pushed it out of being able to change itself. It would eventually enter a static state and no longer a threat.

But wouldn’t sufficiently advanced self-seeing NN avoid risky changes and even have a ctrl-z function? The later still has the issue shown above; it will change something and lose access to ctrl-z. The former is a bit more complicated: as “intelligence” increases, the conformal space also increases. In order to search that conformal space for a possible solution to a novel problem, some self-risk will be inevitable. An NN that avoids such risky behavior won’t see the possible solution fast enough to be a risk, or even at all; choosing to survive with the “problem” rather than self-destruct.

Basically, just ensure the NN is so complicated the NN can’t know itself post-change with sufficient fidelity to take certain risks.

But don’t humans have this problem too? Like, what if a neurologist got single-neuron fiderity r/w access to his own brain… No. Not a problem since we have an external impetus and safeguard, not just internal ones. The mind-body-soul trinity avoids this “alignment” problem (read: solution) entirely.