[Proposal] Is reasoning in natural language grokkable? Training models on language formulations of toy tasks.
Previous work on grokking finds that models can grok modular addition and tree search. However, these are not tasks formulated in natural language. Instead, the tokens correspond directly to true underlying abstract entities, such as numerical values or nodes in a graph. I question whether this representational simplicity is a key ingredient of grokking reasoning.
I have a prior that expressing concepts in natural language (as opposed to directly representing concepts as tokens) introduces an additional layer of complexity which makes grokking much more difficult.
The proposal here is to repeat the experiments with tasks that test equivalent reasoning skills, but which are formulated in natural language.
Modular addition can be formulated as “day of the week” math, as has been done previously
Tree search is more difficult to formulate, but might be phrasable as some kind of navigation instruction.
I’d expect that we could observe grokking, but that it might take a lot longer (and require larger models) when compared to the “direct concept tokenization”. Conditioned on this being true, it would be interesting to observe whether we recover the same kinds of circuits as demonstrated in prior work.
[Proposal] Is reasoning in natural language grokkable? Training models on language formulations of toy tasks.
Previous work on grokking finds that models can grok modular addition and tree search. However, these are not tasks formulated in natural language. Instead, the tokens correspond directly to true underlying abstract entities, such as numerical values or nodes in a graph. I question whether this representational simplicity is a key ingredient of grokking reasoning.
I have a prior that expressing concepts in natural language (as opposed to directly representing concepts as tokens) introduces an additional layer of complexity which makes grokking much more difficult.
The proposal here is to repeat the experiments with tasks that test equivalent reasoning skills, but which are formulated in natural language.
Modular addition can be formulated as “day of the week” math, as has been done previously
Tree search is more difficult to formulate, but might be phrasable as some kind of navigation instruction.
I’d expect that we could observe grokking, but that it might take a lot longer (and require larger models) when compared to the “direct concept tokenization”. Conditioned on this being true, it would be interesting to observe whether we recover the same kinds of circuits as demonstrated in prior work.