More generally, science is about identifying the structure and patterns in the world; the task taxonomy learned by powerful language models may be very convergent and could be a useful map for understanding the territory of the world we are in. What’s more, such a decomposition would itself be of scientifico-philosophical interest — it would tell us something about thinking.
I would love to see someone expand on the ways we could use interpretability to learn about the world, or the structure of tasks (or perhaps examples of how we’ve already done this?). Aside from being interesting scientifically, maybe this could also help us build economically valuable systems which are more explicit and predictable?
I would love to see someone expand on the ways we could use interpretability to learn about the world, or the structure of tasks (or perhaps examples of how we’ve already done this?). Aside from being interesting scientifically, maybe this could also help us build economically valuable systems which are more explicit and predictable?