How does shard theory differ from the Olah-style interpretability agenda? Why is there any reason to believe we can learn about “shards” without interpretability?
How does shard theory differ from the Olah-style interpretability agenda? Why is there any reason to believe we can learn about “shards” without interpretability?