Until someone can propose actual designs for hardware or software that would implement “shard theory” concepts without just becoming an obfuscated reward function prone to the same failure modes as everything else, it’s not incredibly useful to me. However, I think it’s worth engaging with the idea because if correct then other research directions might be a dead-end.
Have you read A shot at the diamond alignment problem? If so, what do you think of it?