Seth Herd comments on LLM Generality is a Timeline Crux

Seth Herd 24 Jun 2024 21:24 UTC
42 points
22
IMO, these considerations do lengthen expected timelines, but not enough or certainly enough that we can ignore the possibility of very short timelines. The distribution of timelines matters a lot, not just the point estimate.

We are still not directing enough of our resources to this possibility if I’m right about that. We have limited time and brainpower in alignment, but it looks to me like relatively little is directed at scenarios in which the combination of scaling and scaffolding language models fairly quickly achieves competent agentic AGI. More on this below.

Excellent post, big upvote.

These are good questions, and you’ve explained them well.

I have a bunch to say on this topic. Some of it I’m not sure I should say in public, as even being convincing about the viability of this route will cause more people to work on it. So to be vague: I think applying cognitive psychology and cognitive neuroscience systems thinking to the problem suggests many routes to scaffold around the weaknesses of LLMs. I suspect the route from here to AGI is disjunctive; several approaches could work relatively quickly. There are doubtless challenges to implementing that scaffolding, as people have encountered in the last year, pursuing the most obvious and simple routes to scaffold LLMs into more capable agents.

A note on biases: I’m afraid people in AI safety are sometimes not taking short timelines seriously because it is emotionally difficult to hold both that view, and a pessimistic view of alignment difficulty. I myself am biased in both directions; there’s an emotional pull toward having my theories proven right, and seeing cool agents soon, and before they’re quite smart enough to take over. In other moments I am fervently hoping that LLMs are deceptively far from competent agentic AGI, and we more years to work and enjoy. I think it would be wonderful to keep pushing beyond guesses, as you suggest. I don’t know how to publicly estimate possible timelines without being specific enough and convincing enough to aid progress in that direction, so I intend to first write a post about the question: if we hypothetically could get to AGI from scaffolded LLMs, should we push in that direction, based on their apparently large advantages for alignment? I suspect the answer is yes, but I’m not going to make that decision unilaterally, and I haven’t yet gotten enough people to engage deeply enough in private to be sure enough.

So for now, I’ll just say: it really doesn’t look like we can rule it out, so we should be working on alignment for possible short timelines to LMA full AGI.

Which brings me back to the note above: lots of the limited resources in alignment are going into “aligning” language models. Scare quotes to indicate the disagreement over whether or how much that’s going to contribute to aligning agents built out of LLMs. The disagreement about whether prosaic alignment is enough or even contributes much is a nontrivial issue. I’ve posted about it in the past and am currently attempting to think through and write about it more clearly.

My current answer is that prosaic alignment work on LLMs helps in those scenarios, but doesn’t do the whole job. There are very separate issues in how LLM-based agents will be given explicit goals, and the factors weighing on whether those goals are precise enough and reflexively stable in a system that gains full freedom and self-awareness. I would dearly love a few more collaborators in addressing that part of what still looks like the single most likely AGI scenario, and almost certainly the fastest route to AGI if it works.
What links here?
- Seth Herd's comment on Seth Herd’s Shortform by Seth Herd (6 Jul 2024 22:35 UTC; 5 points)