The first correction I’d offer is that I wouldn’t present this as purely springing from Quintin and Alex—“Shard Theory” is mostly a sexy name for Steve Byrnes’ picture of drives in the brain, together with some suggestive language identifying “shards” as agents, and some claims about how it applies to AI alignment.
Definitely agree it’s not purely springing from us. Shard theor inherits an enormous amount from Steve’s picture of the brain, and from a generous number of private communications with Steve this spring. I don’t know of neuroscience I’m in disagreement with Steve about (and he knows a lot more than I do, as well). I perceive us to have similar neuroscientific assumptions. But I’d add two clarifications:
Shard theory has substantially different emphases (multi-optimizer dynamics, values handshakes among shards, origins of self-reflective modeling, origins of biases, moral reflection as shard deliberation, terminalization of instrumental “activation-level” values into “weight-level” shards of their own). I also consider myself to be taking speculation to different places (it seems like shard theory should apply more generally to AI systems as well, and anything which satisfies its assumptions, mod second-order effects of inductive biases).
we think that shards are not discrete subagents with their own world models and mental workspaces. We currently estimate that most shards are “optimizers” to the extent that a bacterium or a thermostat is an optimizer.
Quintin—yes, indeed, one of the reasons I was excited about Shard Theory was that it has these different emphases you mention (e.g. ‘multi-optimizer dynamics, values handshakes among shards, origins of self-reflective modeling, origins of biases, moral reflection as shard deliberation’), which I thought might actually be useful to develop and integrate with in evolutionary psychology and other branches of psychology, not just in AI alignment.
So I wanted to see if Shard Theory could be made a little more consistent with behavior genetics and ev psych theories and findings, so it could have more impact in those fields. (Both fields can get a little prickly about people ignoring their theories and findings, since they’ve been demonized for ideological reasons since the 1970s and 1990s, respectively).
Indeed, you might find quite a few similarities and analogies between certain elements of Shard Theory and certain traditional notions in evolutionary psychology, such as domain-specificity, adaptive hypocrisy and adaptive self-deception, internal conflicts between different adaptive strategies, satisficing of fitness proxies as instrumental convergent goals rather than attempting to maximize fitness itself as a terminal value, etc. Shard Theory can potentially offer some new perspectives on those traditional concepts, in the light of modern reinforcement learning theory in machine learning.
Definitely agree it’s not purely springing from us. Shard theor inherits an enormous amount from Steve’s picture of the brain, and from a generous number of private communications with Steve this spring. I don’t know of neuroscience I’m in disagreement with Steve about (and he knows a lot more than I do, as well). I perceive us to have similar neuroscientific assumptions. But I’d add two clarifications:
Shard theory has substantially different emphases (multi-optimizer dynamics, values handshakes among shards, origins of self-reflective modeling, origins of biases, moral reflection as shard deliberation, terminalization of instrumental “activation-level” values into “weight-level” shards of their own). I also consider myself to be taking speculation to different places (it seems like shard theory should apply more generally to AI systems as well, and anything which satisfies its assumptions, mod second-order effects of inductive biases).
I don’t view shards as agents. As I wrote in the main essay:
Quintin—yes, indeed, one of the reasons I was excited about Shard Theory was that it has these different emphases you mention (e.g. ‘multi-optimizer dynamics, values handshakes among shards, origins of self-reflective modeling, origins of biases, moral reflection as shard deliberation’), which I thought might actually be useful to develop and integrate with in evolutionary psychology and other branches of psychology, not just in AI alignment.
So I wanted to see if Shard Theory could be made a little more consistent with behavior genetics and ev psych theories and findings, so it could have more impact in those fields. (Both fields can get a little prickly about people ignoring their theories and findings, since they’ve been demonized for ideological reasons since the 1970s and 1990s, respectively).
Indeed, you might find quite a few similarities and analogies between certain elements of Shard Theory and certain traditional notions in evolutionary psychology, such as domain-specificity, adaptive hypocrisy and adaptive self-deception, internal conflicts between different adaptive strategies, satisficing of fitness proxies as instrumental convergent goals rather than attempting to maximize fitness itself as a terminal value, etc. Shard Theory can potentially offer some new perspectives on those traditional concepts, in the light of modern reinforcement learning theory in machine learning.