A clarification about in what sense I claim “biological and artificial neural-networks are based upon the same fundamental principles”:
I would not be surprised if the reasons why neural networks “work” are also exploited by the brain.
In particular why I think neuroscience for value alignment is good is because we can expect that the values part of the brain will be compatible with these reasons, and won’t require too much extra fundamental advances to actually implement, unlike say corrigibility, which will first progress from ideal utility maximizers, and then require a mapping from that to neural networks, which seems potentially just as hard as writing an AGI from scratch.
In the case where human values are incompatible with artificial neural networks, again I get much more pessimistic about all alternative forms of value alignment of neural networks.
Does this comment I wrote clear up my claim?
It helps a little but I feel like we’re operating at too high a level of abstraction.