I agree that this sounds interesting and that I haven’t heard of anyone doing this yet. I have heard of some interpretability experiments with TinyStories, as Zac mentioned. I think the more interesting thing would be a dataset focused on being enriched with synthetic data showing inherently logical things like deductive symbolic logic and math problems worked out (correctly!) step-by-step. You could have a dataset of this, plus the simplified-language versions of middle school through undergrad science textbooks. I expect the result would likely be more logical, and cohesive. It would be interesting to see if this made the model fundamentally more interpretable.
I agree that this sounds interesting and that I haven’t heard of anyone doing this yet. I have heard of some interpretability experiments with TinyStories, as Zac mentioned. I think the more interesting thing would be a dataset focused on being enriched with synthetic data showing inherently logical things like deductive symbolic logic and math problems worked out (correctly!) step-by-step. You could have a dataset of this, plus the simplified-language versions of middle school through undergrad science textbooks. I expect the result would likely be more logical, and cohesive. It would be interesting to see if this made the model fundamentally more interpretable.