Filip Sondej comments on Current AIs Provide Nearly No Data Relevant to AGI Alignment

Filip Sondej 18 Jan 2024 12:25 UTC
5 points
0
Unfortunately I didn’t have any particular tasks in mind when I wrote it. I was vaguely thinking about settings as in:
- https://arxiv.org/pdf/2305.04388.pdf
- https://arxiv.org/pdf/2307.13702.pdf
Now that I though about it, for this particular transformers vs mamba experiment, I’d go with something even simpler. I want a task that is very easy sequentially, but hard to answer immediately. So for example a task like:
```
x = 5
x += 2
x *= 3
x **= 2
x -= 3
...
```
and then have a CoT:
```
after x = 5
5
after x += 2
7
...
```
And then we intervene on CoT to introduce some error in one operation, but still ask of the model to give the correct answer at the end. (Despite all steps after the error being irrelevant.) We can go even further and train the models to give the correct answer after inadequate CoT. And have a curriculum where at first it only needs to do one hidden operation, later two, and so on.

(It’s an unrealistic setting, but the point is rather to check if the model is able at all to learn hidden sequential reasoning.)

Now, my hypothesis is that transformers will have some limited sequence length for which they can do it (probably smaller than their number of layers), but mamba won’t have a limit.

I was working on this for six months

Can you say what you tried in these six months and how did it go?