Deliberative Cognitive Algorithms as Scaffolding
As rationalists, we are interested in finding systematic techniques that boost our effective intelligence. Because we tend to be mathematical thinkers, we usually look for precise algorithmic solutions, such as computing the optimal inference laws of Bayesian probability when this is feasible. I will call this the “rationalist project” because it seems like a reasonable candidate for the title (and not because it is the only or necessarily the primary one). There are other natural ways to increase intelligence (neuroscience or pharmaceutical tricks) but at least for me, these are not as intellectually satisfying, and also seem unlikely to yield large percentage or qualitative gains for most people. Pursuing our project, we often borrow from other (academic) fields, most notably artificial intelligence, cognitive science, and economics. None of these is really suitable.
Artificial intelligence studies how to build intelligent artifacts. In the early days, this had something to do with studying logic, heuristics, and abstractions, which are tools that humans can and did apply (though I am not so sure GOFAI increased our ability to apply them ourselves, except through loose analogies). Nowadays, the goals of A.I. research have start to come apart from the goals of the rationalist project. The art of distributing training across GPUs and choosing architectures that fully utilize them, constructing stable gradients, etc., has no direct relationship to the project. Building a superintelligence does not necessarily require the same skills as bootstrapping yourself into one without access to your own hardware. I think that this divergence of goals is sometimes ignored on lesswrong, because A.I. seems more interesting and important the more it succeeds. But in many ways, shifts in focus that drove its success also make it less central to the project.
Cognitive science is much closer to the project by its nature. However, as a science, it focuses primarily on how people do think, and not on how they should. The cognitive science research that does address how people should think tends to be a little less algorithmic and rigorous than in A.I., at least in my limited experience. One exception is this excellent paper from Josh Tenenbaum, which I will return to in this sequence: https://cocosci.princeton.edu/tom/papers/OneAndDone.pdf
Bayesian decision theory is the central mathematical framework of rationalist thought, and one of many useful tools adopted from economics. However, it is a mathematical specification of optimal decision making, not an algorithm for optimal decision making. Jaynes has contributed much to the foundations of the project, but even his goals in Probability Theory as Logic were (explicitly!) not the goals of the project. He imagines ideal reasoning as carried out by an ideal “robot” reasoner unconstrained by computational limits. Economists do, of course, consider models of limited computation decision making. However, even this does not precisely describe our situation as human reasoners: because much of our processing is unconscious, we are not capable of adopting the optimal algorithm for a decision problem even if we knew it! Instead we must adopt the optimal algorithm that we can actually run, with access to the parts of our minds that we don’t consciously control.
The project seeks deliberative cognitive algorithms: algorithms that we can learn and consciously execute. These algorithms execute in an environment with access to certain fixed resources: our senses are very good feature extractors, and our unconscious minds are very good pattern recognizers. The optimal deliberative cognitive algorithms would take advantage of these resources; but the internal workings of these resources may not be relevant (that is, may be “screened off” by their teleology and limitations). Constructing such DCA’s with black box access to powerful unconscious modules is the goal of the rationalist project.
There is already a term for such DCA’s: scaffolding.
This term was originally introduced in Scaffolded LLMs as natural language computers. I don’t really endorse the frame in this post, and I am interested in scaffolding various kinds of modules, not restricted to LLMs.
In complexity theory, we might call these “algorithms with oracle access.” So I will tend to say that DCA’s have oracle access to modules.
Explicitly, scaffolding research advances the rationalist project because it tells us how to best consciously take advantage of our unconscious abilities.
I do not claim that scaffolding is the best way to achieve A.G.I. This is probably false. It is probably best for higher level executive functions to interweave their computations closely with the modules they use, perhaps even to the extent that there is no obvious separation (though in the brain, I understand from Capabilities and alignment of LLM cognitive architectures that there is great functional separation). The fastest way to build an A.G.I. is probably to train the whole thing end to end in some fashion, so that it can take the best possible advantage of all available synergies between tasks. Scaffolding generative models to create agents resembles the long tradition of neuro-symbolic A.I. that never really works; I think it is based on the fantasy that if we don’t know how to build capability X into system M, we can fake it by adding capability X on top of system M. This is consistently a kludge, and I don’t know of any significant progress arising from it.
Indeed, the interweaving of deliberate executive functions with unconscious functions probably takes place in human minds as well. It’s possible that these are so inextricable that humans can’t effectively wield scaffolding methods, but I suspect not.
I want to explore the possibility that scaffolding is the right frame for the rationalist project. Over the course of this sequence, I will explore the theoretical and experimental sides of this thesis, attempting to establish scaffolding on firmer mathematical and scientific ground. Though this is not my primary goal, I will also explore the implications for alignment (at risk of being caught in an affective death spiral, I believe scaffolding is relevant to running our minds more effectively with black box access to our unconscious resources AND perhaps to building safe agents with black box access to un-agentic parts).
Let’s get started!
I mostly agree with you here, but I think in this case the details do matter a bit. I know that the main topic of your post is how to improve human rationality, but I’m going to go on a side tangent to talk about Recursive Self Improvement.
In the case where we have a generative model which can write code, generate simple hypothesis from data, utilize an API to initiate tests based on those hypotheses, analyze the resulting data, repeat.… We have a sort of dumb brute-force scientist. If we have a lot of those, and throw them at the problem of improving AI (which is a plausible thing which could come to pass), then we might see sudden-seeming progress on developing an improved algorithm which incorporates more of the thought process into the model itself.
I believe we are just shy of the point where the frontier LLMs are sufficient to act even as dumb versions of a brute-force scientist. I think we are close enough that a temporary kludgey hack might get us over the edge. Specifically, I think that scaffolding such as is described in this recent paper from DeepMind, combined with a stronger successor to GPT-4 (e.g. GPT-5), would probably be enough.
I think that part of the parcel of algorithmic progress which incorporates more of the meta-level reasoning into the model and training process itself will also lead to efficiency gains in training.
Thus, I have made this prediction market to express my expectation that GPT-5, used in this brute-force scientist way with adequate scaffolding, will result in finding advances sufficient to train a next generation of superior models with only the same amount of compute used to train GPT-5.
Yes, there is also an earlier Microsoft Research paper which is, in some sense, more modest, but is more directly pointing towards recursive self-improvement via scaffolding+LLM generating a better scaffolding for the same LLM, and then repeating this operation several times with better and better scaffolding.
One particularly interesting piece of data there is Figure 4 on page 6 which shows the dependency on the quality of the underlying LLM (the process actually does not work and leads to degradation with GPT-3.5, and the same process successfully self-improves for a few iterations (but then saturates) with GPT-4).
So one might ask how big this self-improvement might be with a better underlying LLM.
Orthogonally to the quality of the underlying LLM, I think it is not too difficult to improve methods for scaffolding generation quite a bit (there are various ways to make it much better than in these papers, even with the current LLMs). So one indeed wonders how soon this becomes a major contributor to the take-off speed...
It is probably possible to make some form of scaffolding work, I’m just skeptical that it’s going to be as effective as training an agent directly. Depending on timelines, scaffolding might still feed progress towards superintelligence.
What is most upstream of good cognitive strategies that lead to useful behaviors? What is most upstream of bad cognitive strategies that lead to maladaptive behaviors?