Agreed. I was working on this for six months and I’ve been trying to get more people to work on it.
We don’t have a way of measuring CoT faithfulness as far as I know, in general—but you emphasize “tasks where we can evaluate...” that seems intriguing to me, you are saying it may be feasible today for some tasks at least. What tasks do you have in mind?
Now that I though about it, for this particular transformers vs mamba experiment, I’d go with something even simpler. I want a task that is very easy sequentially, but hard to answer immediately. So for example a task like:
x = 5
x += 2
x *= 3
x **= 2
x -= 3
...
and then have a CoT:
after x = 5
5
after x += 2
7
...
And then we intervene on CoT to introduce some error in one operation, but still ask of the model to give the correct answer at the end. (Despite all steps after the error being irrelevant.) We can go even further and train the models to give the correct answer after inadequate CoT. And have a curriculum where at first it only needs to do one hidden operation, later two, and so on.
(It’s an unrealistic setting, but the point is rather to check if the model is able at all to learn hidden sequential reasoning.)
Now, my hypothesis is that transformers will have some limited sequence length for which they can do it (probably smaller than their number of layers), but mamba won’t have a limit.
I was working on this for six months
Can you say what you tried in these six months and how did it go?
Agreed. I was working on this for six months and I’ve been trying to get more people to work on it.
We don’t have a way of measuring CoT faithfulness as far as I know, in general—but you emphasize “tasks where we can evaluate...” that seems intriguing to me, you are saying it may be feasible today for some tasks at least. What tasks do you have in mind?
Unfortunately I didn’t have any particular tasks in mind when I wrote it. I was vaguely thinking about settings as in:
https://arxiv.org/pdf/2305.04388.pdf
https://arxiv.org/pdf/2307.13702.pdf
Now that I though about it, for this particular transformers vs mamba experiment, I’d go with something even simpler. I want a task that is very easy sequentially, but hard to answer immediately. So for example a task like:
and then have a CoT:
And then we intervene on CoT to introduce some error in one operation, but still ask of the model to give the correct answer at the end. (Despite all steps after the error being irrelevant.) We can go even further and train the models to give the correct answer after inadequate CoT. And have a curriculum where at first it only needs to do one hidden operation, later two, and so on.
(It’s an unrealistic setting, but the point is rather to check if the model is able at all to learn hidden sequential reasoning.)
Now, my hypothesis is that transformers will have some limited sequence length for which they can do it (probably smaller than their number of layers), but mamba won’t have a limit.
Can you say what you tried in these six months and how did it go?
FYI, I did the experiments I wrote about in my other comment and just posted them. (I procrastinated writing up the results for too long.) https://www.lesswrong.com/posts/ZB6guMhHH3NEyxA2k/testing-which-llm-architectures-can-do-hidden-serial-3