Fair enough; this sentiment is only mentioned offhand in comments and might not capture very much of the average opinion. I may be misestimating the community’s average opinion. I hope I am wrong, and I’m glad to see others don’t agree!
I’m a bit puzzled on the average attitude toward LLM agents as a route to AGI among alignment workers. I’m still surprised there aren’t more people working directly on aligning LLM agents. I’d think we’d be working harder on the most likely single type of first AGI if lots of us really believed it’s reasonably likely to get there (especially since it’s probably the fastest route if it doesn’t plateau soon).
One possible answer is that people are hoping that the large amount of work on aligning LLMs will cover aligning LLM agents. I think that work is helpful but not sufficient, so we need more thinking about agent alignment as distinct from their base LLMs. I’m currently writing a post on this.
(Most people in AI Alignment work at scaling labs and are therefore almost exclusively working on LLM alignment. That said, I don’t actually know what it means to work on LLM alignment over aligning other systems, it’s not like we have a ton of traction on LLM alignment, and most techniques and insights seem general enough to not be conditional specifically on LLMs)
I think Seth is distinguishing “aligning LLM agents” from “aligning LLMs”, and complaining that there’s insufficient work on the former, compared to the latter? I could be wrong.
I don’t actually know what it means to work on LLM alignment over aligning other systems
Ooh, I can speak to this. I’m mostly focused on technical alignment for actor-critic model-based RL systems (a big category including MuZero and [I argue] human brains). And FWIW my experience is: there are tons of papers & posts on alignment that assume LLMs, and with rare exceptions I find them useless for the non-LLM algorithms that I’m thinking about.
When we get outside technical alignment to things like “AI control”, governance, takeoff speed, timelines, etc., I find that the assumption of LLMs is likewise pervasive, load-bearing, and often unnoticed.
I complain about this from time to time, for example Section 4.2 here, and also briefly here (the bullets near the bottom after “Yeah some examples would be:”).
I agree with the claim that the techniques and insights for alignment that are usually considered are not conditional on LLMs specifically, including my own plan for AI alignment.
Fair enough; this sentiment is only mentioned offhand in comments and might not capture very much of the average opinion. I may be misestimating the community’s average opinion. I hope I am wrong, and I’m glad to see others don’t agree!
I’m a bit puzzled on the average attitude toward LLM agents as a route to AGI among alignment workers. I’m still surprised there aren’t more people working directly on aligning LLM agents. I’d think we’d be working harder on the most likely single type of first AGI if lots of us really believed it’s reasonably likely to get there (especially since it’s probably the fastest route if it doesn’t plateau soon).
One possible answer is that people are hoping that the large amount of work on aligning LLMs will cover aligning LLM agents. I think that work is helpful but not sufficient, so we need more thinking about agent alignment as distinct from their base LLMs. I’m currently writing a post on this.
(Most people in AI Alignment work at scaling labs and are therefore almost exclusively working on LLM alignment. That said, I don’t actually know what it means to work on LLM alignment over aligning other systems, it’s not like we have a ton of traction on LLM alignment, and most techniques and insights seem general enough to not be conditional specifically on LLMs)
I think Seth is distinguishing “aligning LLM agents” from “aligning LLMs”, and complaining that there’s insufficient work on the former, compared to the latter? I could be wrong.
Ooh, I can speak to this. I’m mostly focused on technical alignment for actor-critic model-based RL systems (a big category including MuZero and [I argue] human brains). And FWIW my experience is: there are tons of papers & posts on alignment that assume LLMs, and with rare exceptions I find them useless for the non-LLM algorithms that I’m thinking about.
As a typical example, I didn’t get anything useful out of Alignment Implications of LLM Successes: a Debate in One Act—it’s addressing a debate that I see as inapplicable to the types of AI algorithms that I’m thinking about. Ditto for the debate on chain-of-thought accuracy vs steganography and a zillion other things.
When we get outside technical alignment to things like “AI control”, governance, takeoff speed, timelines, etc., I find that the assumption of LLMs is likewise pervasive, load-bearing, and often unnoticed.
I complain about this from time to time, for example Section 4.2 here, and also briefly here (the bullets near the bottom after “Yeah some examples would be:”).
I agree with the claim that the techniques and insights for alignment that are usually considered are not conditional on LLMs specifically, including my own plan for AI alignment.