Shard theory explicitly assumes certain claims about how the human brain works, in particular that the genome mostly specifies crude neural reward circuitry and that ~all of the details of the cortex are basically randomly initialized. I think these claims are plausible but uncertain and quite important for AI safety, so I would be excited about more people looking into this question given that it seems controversial among neuroscientists and geneticists, and also seems tractable given that there is a wealth of existing neuroscience research.
Note that this pertains to the shard theory of human values, not shard-centric models of how AI values might form. That said, I’m likewise interested in investigation of the assumptions. EG how people work is important probabilistic evidence for how AI works because there are going to be “common causes” to effective real-world cognition and design choices.
Note that this pertains to the shard theory of human values, not shard-centric models of how AI values might form. That said, I’m likewise interested in investigation of the assumptions. EG how people work is important probabilistic evidence for how AI works because there are going to be “common causes” to effective real-world cognition and design choices.
Good point, post updated accordingly.