In my personal view, ‘Shard theory of human values’ illustrates both the upsides and pathologies of the local epistemic community.
The upsides - majority of the claims is true or at least approximately true - “shard theory” as a social phenomenon reached critical mass making the ideas visible to the broader alignment community, which works e.g. by talking about them in person, votes on LW, series of posts,... - shard theory coined a number of locally memetically fit names or phrases, such as ‘shards’ - part of the success leads at some people in the AGI labs to think about mathematical structures of human values, which is an important problem
The downsides - almost none of the claims which are true are original; most of this was described elsewhere before, mainly in the active inference/predictive processing literature, or thinking about multi-agent mind models - the claims which are novel seem usually somewhat confused (eg human values are inaccessible to the genome or naive RL intuitions) - the novel terminology is incompatible with existing research literature, making it difficult for alignment community to find or understand existing research, and making it difficult for people from other backgrounds to contribute (while this is not the best option for advancement of understanding, paradoxically, this may be positively reinforced in the local environment, as you get more credit for reinventing stuff under new names than pointing to relevant existing research)
Overall, ‘shards’ become so popular that reading at least the basics is probably necessary to understand what many people are talking about.
In my personal view, ‘Shard theory of human values’ illustrates both the upsides and pathologies of the local epistemic community.
The upsides
- majority of the claims is true or at least approximately true
- “shard theory” as a social phenomenon reached critical mass making the ideas visible to the broader alignment community, which works e.g. by talking about them in person, votes on LW, series of posts,...
- shard theory coined a number of locally memetically fit names or phrases, such as ‘shards’
- part of the success leads at some people in the AGI labs to think about mathematical structures of human values, which is an important problem
The downsides
- almost none of the claims which are true are original; most of this was described elsewhere before, mainly in the active inference/predictive processing literature, or thinking about multi-agent mind models
- the claims which are novel seem usually somewhat confused (eg human values are inaccessible to the genome or naive RL intuitions)
- the novel terminology is incompatible with existing research literature, making it difficult for alignment community to find or understand existing research, and making it difficult for people from other backgrounds to contribute (while this is not the best option for advancement of understanding, paradoxically, this may be positively reinforced in the local environment, as you get more credit for reinventing stuff under new names than pointing to relevant existing research)
Overall, ‘shards’ become so popular that reading at least the basics is probably necessary to understand what many people are talking about.