Tapatakt comments on Tapatakt’s Shortform

Tapatakt 21 Aug 2024 15:05 UTC
16 points
2
Did anyone try something like this?
1. Create a conlang with very simple grammar and small vocabulary (not like tokipona small, more like xkcd-thing-explainer small).
2. Use LLMs to translate a lot of texts into this conlang.
3. Train new LLM on this translations.
4. Try to research interpretability on this LLM.
- Viliam 22 Aug 2024 8:30 UTC
  4 points
  2
  Parent
  There is a Simple English Wikipedia with over 200 000 articles, which is not exactly what you want, but seems to be a thing that already exists and is somewhat in that direction.
- Zac Hatfield-Dodds 21 Aug 2024 15:51 UTC
  4 points
  0
  Parent
  I don’t recall any interpretability experiments with TinyStories offhand, but I’d be surprised if there aren’t any.
- Nathan Helm-Burger 21 Aug 2024 17:53 UTC
  2 points
  0
  Parent
  I agree that this sounds interesting and that I haven’t heard of anyone doing this yet. I have heard of some interpretability experiments with TinyStories, as Zac mentioned. I think the more interesting thing would be a dataset focused on being enriched with synthetic data showing inherently logical things like deductive symbolic logic and math problems worked out (correctly!) step-by-step. You could have a dataset of this, plus the simplified-language versions of middle school through undergrad science textbooks. I expect the result would likely be more logical, and cohesive. It would be interesting to see if this made the model fundamentally more interpretable.