johnswentworth comments on The Plan − 2024 Update

johnswentworth 1 Jan 2025 11:34 UTC
8 points
0
I don’t know how much the difficulty of crossing the theory-practice gap has deviated from your expectations since then.
It’s been pretty on-par.
But I would indeed be worried that a lot of the difficulty is going to be in getting any good results for deep learning, and that finding additional theoretical/conceptual results in other settings doesn’t constitute much progress on that.
Amusingly, I tend to worry more about the opposite failure mode: findings on today’s nets won’t generalize to tomorrow’s nets (even without another transformers-level paradigm shift), and therefore leveraging evidence from other places is the only way to do work which will actually be relevant.
(More accurately, I worry that the relevance or use-cases of findings on today’s nets won’t generalize to tomorrow’s nets. Central example: if we go from a GPT-style LLM to a much bigger o1/o3-style model which is effectively simulating a whole society talking to each other, then the relationship between the tokens and the real-world effects of the system changes a lot. So even if work on the GPT-style models tells us something about the o1/o3-style models, its relevance is potentially very different.)
I assume that was some other type of experiment involving image generators? (and the notion of “working well” there isn’t directly comparable to what you tried now?)
Yeah, that was on a little MNIST net. And the degree of success I saw in that earlier experiment was actually about on par with what we saw in our more recent experiments, our bar was just quite a lot higher this time around. This time we were aiming for things like e.g. “move one person’s head” rather than “move any stuff in any natural way at all”.