Excellent post. Big upvote, and I’m still digesting all of the points you’ve made. I’ll respond more substantively later. For now, a note on possible terminology. I wrote a followup to my brief “agentized LLMs”, Capabilities and alignment of LLM cognitive architectures where I went into more depth on capabilities and alignment; I made many but not all of the points you raised. I proposed the term language model cognitive architectures (LMCAs) there, but I’m now favoring “language model agents” as a more intuitive and general term.
The tag someone just applied to this post, Chain-of-Thought Alignment has been a good link to related thinking.
I’m a bit surprised that there isn’t more activity on this in the alignment community yet, but time will tell if this approach works and takes off as well and as fast as I expect.
More soon. I’m working on a followup post that may be a good pair for this one, making more explicit the alignment advantages and arguing that we should actually push capabilities in this direction since this seems like a lot of upside and very little downside relative to other potential routes to AGI.
I am also surprised at how little attention these systems have been receiving.
I was reading about CoT reasoning plus early S-LLMs around September of last year at the same time I was encountered Yann LeCun’s “A Path Toward Autonomous Machine Intelligence”. While LeCun’s paper barely discusses language models, it does provide a plausible framework for building a cognitive architecture.
The above planted the seed, so that when I saw the BabyAGI architecture diagram I immediately thought “This does plausibly seem like a paradigm that could lead to very powerful models (and I wish nobody had thought of it)”.
You seem very optimistic about these systems, I think the crux of the disagreement will be that I think it’s plausible these systems will bring about AGI sooner than a pathway which only involved trying to train larger and large models (at incredibly cost).
I’ll show you that draft when it’s ready; thanks for the offer!
A couple of thoughts:
At this point I’m torn between optimism based on the better interpretability and pessimism based on the multipolar scenario. The timeline doesn’t bother me that much, since I don’t think more general alignment work would help much in aligning those specific systems if they make it to AGI.and of course I’d like a longer timeline for me and others to keep enjoying life. My optimism is relative, and I still have something like a vague 50% chance of failure.
Shorter timelines have an interesting advantage of avoiding compute and algorithm overhangs that create fast, discontinuous progress. This new post makes the case in detail. I’m not at all sure this advantage outweighs the loss of time to work on alignment, since that’s certainly helpful.
So I’m entirely unsure whether I wish no one had thought of this. But in retrospect it seems like too obvious an idea to miss. The fact that almost everyone in the alignment community (including me) was blindsided by it seems like a warning sign that we need to work harder to predict new technologies and not fight the last war. One interesting factor is that many of us who saw this or had vague thoughts in this direction never mentioned it publicly, to avoid helping progress; but the hope that no one would think of such an obvious idea pretty quickly was in retrospect totally unreasonable.
Excellent post. Big upvote, and I’m still digesting all of the points you’ve made. I’ll respond more substantively later. For now, a note on possible terminology. I wrote a followup to my brief “agentized LLMs”, Capabilities and alignment of LLM cognitive architectures where I went into more depth on capabilities and alignment; I made many but not all of the points you raised. I proposed the term language model cognitive architectures (LMCAs) there, but I’m now favoring “language model agents” as a more intuitive and general term.
The tag someone just applied to this post, Chain-of-Thought Alignment has been a good link to related thinking.
I’m a bit surprised that there isn’t more activity on this in the alignment community yet, but time will tell if this approach works and takes off as well and as fast as I expect.
More soon. I’m working on a followup post that may be a good pair for this one, making more explicit the alignment advantages and arguing that we should actually push capabilities in this direction since this seems like a lot of upside and very little downside relative to other potential routes to AGI.
I am also surprised at how little attention these systems have been receiving.
I was reading about CoT reasoning plus early S-LLMs around September of last year at the same time I was encountered Yann LeCun’s “A Path Toward Autonomous Machine Intelligence”. While LeCun’s paper barely discusses language models, it does provide a plausible framework for building a cognitive architecture.
The above planted the seed, so that when I saw the BabyAGI architecture diagram I immediately thought “This does plausibly seem like a paradigm that could lead to very powerful models (and I wish nobody had thought of it)”.
You seem very optimistic about these systems, I think the crux of the disagreement will be that I think it’s plausible these systems will bring about AGI sooner than a pathway which only involved trying to train larger and large models (at incredibly cost).
I’d be keen to read the draft if you’re offering.
I’ll show you that draft when it’s ready; thanks for the offer!
A couple of thoughts:
At this point I’m torn between optimism based on the better interpretability and pessimism based on the multipolar scenario. The timeline doesn’t bother me that much, since I don’t think more general alignment work would help much in aligning those specific systems if they make it to AGI.and of course I’d like a longer timeline for me and others to keep enjoying life. My optimism is relative, and I still have something like a vague 50% chance of failure.
Shorter timelines have an interesting advantage of avoiding compute and algorithm overhangs that create fast, discontinuous progress. This new post makes the case in detail. I’m not at all sure this advantage outweighs the loss of time to work on alignment, since that’s certainly helpful.
https://www.lesswrong.com/posts/YkwiBmHE3ss7FNe35/short-timelines-and-slow-continuous-takeoff-as-the-safest
So I’m entirely unsure whether I wish no one had thought of this. But in retrospect it seems like too obvious an idea to miss. The fact that almost everyone in the alignment community (including me) was blindsided by it seems like a warning sign that we need to work harder to predict new technologies and not fight the last war. One interesting factor is that many of us who saw this or had vague thoughts in this direction never mentioned it publicly, to avoid helping progress; but the hope that no one would think of such an obvious idea pretty quickly was in retrospect totally unreasonable.