I was also surprised that interpreting webpages was a major blocker. They’re in text and HTML, as you say.
I don’t remember who said this, but I remember believing them since they’d actually tried to make useful agents. They said that actual modern webpages are such a flaming mess of complex HTML that the LLMs get confused easily.
Your last point, whether the direction to easier-to-align AGI or more time to work on alignment is preferable is a very complex issue. I don’t have a strong opinion since I haven’t worked through it all. But I think there are very strong reasons to think LLM-based AGI is far easier to align than other forms, particularly if the successful approach doesn’t heavily rely on RL. So I think your opinion is in the majority, but nobody has worked it through carefully enough to have a really good guess. That’s a project I’d like to embark on by writing a post making the controversial suggestion that maybe we should be actively building LMA AGI as the safest of a bad set of options.
I also think we’ll get substantial info about the feasibility of LMA in the next six months. Progress on ARC-AGI will tell us a lot about LLMs as general reasoners, I think (and Redwood’s excellent new work on ARC-AGI has already updated me somewhat toward this not being a fundamental blocker). And I think GPT-5 will tell us a lot. ‘GPT-4 comes just short of being capable and reliable enough to work well for agentic scaffolding’ is a pretty plausible view. If that’s true, then we should see such scaffolding working a lot better with GPT-5; if it’s false, then we should see continued failures to make it really work.
I was also surprised that interpreting webpages was a major blocker. They’re in text and HTML, as you say.
I don’t remember who said this, but I remember believing them since they’d actually tried to make useful agents. They said that actual modern webpages are such a flaming mess of complex HTML that the LLMs get confused easily.
Your last point, whether the direction to easier-to-align AGI or more time to work on alignment is preferable is a very complex issue. I don’t have a strong opinion since I haven’t worked through it all. But I think there are very strong reasons to think LLM-based AGI is far easier to align than other forms, particularly if the successful approach doesn’t heavily rely on RL. So I think your opinion is in the majority, but nobody has worked it through carefully enough to have a really good guess. That’s a project I’d like to embark on by writing a post making the controversial suggestion that maybe we should be actively building LMA AGI as the safest of a bad set of options.
I think that’d be a really valuable post!
I also think we’ll get substantial info about the feasibility of LMA in the next six months. Progress on ARC-AGI will tell us a lot about LLMs as general reasoners, I think (and Redwood’s excellent new work on ARC-AGI has already updated me somewhat toward this not being a fundamental blocker). And I think GPT-5 will tell us a lot. ‘GPT-4 comes just short of being capable and reliable enough to work well for agentic scaffolding’ is a pretty plausible view. If that’s true, then we should see such scaffolding working a lot better with GPT-5; if it’s false, then we should see continued failures to make it really work.