This was a really interesting read. I’m glad you decided to post it.
I find all of the pieces fairly interesting and plausible in isolation, but I think the post either underexplains or understates the case for how it all fits together. As written, the post roughly says:
first section: there are probably some kinds of computation which can’t be efficiently supported (though such computations might not actually matter)
second section: here’s one particular kind of computation which seems pretty central to how humans think (though I don’t really see a case made here that it’s necessary)
third section: some things which might therefore be difficult
I do think the case can be strengthened a lot, especially for low sample efficiency and the difficulty of inventing new concepts. Here’s a rough outline of the argument I would make:
Sample efficiency is all about how well we approximate Bayesian reasoning. This is one of the few places where both the theory and the empirics have a particularly strong case for Bayes.
Bayesian reasoning, on the sort of problems we’re talking about, means generative models. So if we want sample efficiency, then generative models (or a good approximation thereof) are a necessary element.
GPT-style models do not have “basin of attraction”-style convergence, where they learn the general concept of generative models and can then easily create new generative models going forward. They have to converge to each new generative model the hard way.
That last step is the one I’d be most uncertain about, but it’s also a claim which is “just math”, so it could be checked by either analysis or simulation if we know how the models in question are embedded.
This was a really interesting read. I’m glad you decided to post it.
I find all of the pieces fairly interesting and plausible in isolation, but I think the post either underexplains or understates the case for how it all fits together. As written, the post roughly says:
first section: there are probably some kinds of computation which can’t be efficiently supported (though such computations might not actually matter)
second section: here’s one particular kind of computation which seems pretty central to how humans think (though I don’t really see a case made here that it’s necessary)
third section: some things which might therefore be difficult
I do think the case can be strengthened a lot, especially for low sample efficiency and the difficulty of inventing new concepts. Here’s a rough outline of the argument I would make:
Sample efficiency is all about how well we approximate Bayesian reasoning. This is one of the few places where both the theory and the empirics have a particularly strong case for Bayes.
Bayesian reasoning, on the sort of problems we’re talking about, means generative models. So if we want sample efficiency, then generative models (or a good approximation thereof) are a necessary element.
GPT-style models do not have “basin of attraction”-style convergence, where they learn the general concept of generative models and can then easily create new generative models going forward. They have to converge to each new generative model the hard way.
That last step is the one I’d be most uncertain about, but it’s also a claim which is “just math”, so it could be checked by either analysis or simulation if we know how the models in question are embedded.