Many of us have wondered why LLM-based agents are taking so long to be effective and in common use. One plausible reason that hadn’t occurred to me until now is that no one’s been able to make them robust against prompt injection attacks. Reading an article (‘Agent hijacking: The true impact of prompt injection attacks’) today reminded me of just how hard it is to defend against that for an agent out in the wild.
Counterevidence: based on my experiences in startup-land and the industry’s track record with Internet of Things (‘IoT: the S stands for security!‘), I’d expect at least a couple of startups to be offering LLM agents anyway, ones that are useful but paper over the security issues, and I haven’t seen that as yet. A July Forbes article points to Mindy and Ario as the leaders in the ‘personal assistant’ category; I had never heard of either before, which makes me think they’re not useful enough to get serious traction.
To me, the natural explanation is that they were not trained for sequential decision making and therefore lose coherence rapidly when making long term plans. If I saw an easy patch I wouldn’t advertise it, but I don’t see any easy patch—I think next token prediction works surprisingly well at producing intelligent behavior in contrast to the poor scaling of RL in hard environments. The fact that it hasn’t spontaneously generalized to succeed at sequential decision making (RL style) tasks is in fact not surprising but would have seemed obvious to everyone if not for the many other abilities that did arise spontaneously.
It’s also due to LLMs just not being reliable enough for anything more than say 90% reliability, which is generally unacceptable in a lot of domains that have any lasting impact.
That definitely seems like part of the problem. Sholto Douglas and Trenton Bricken make that point pretty well in their discussion with Dwarkesh Patel from a while ago.
It’ll be interesting to see whether the process supervision approach that OpenAI are reputedly taking with ‘Strawberry’ will make a bit difference to that. It’s a different framing (rewarding good intermediate steps) but seems arguably equivalent.
[Epistemic status: thinking out loud]
Many of us have wondered why LLM-based agents are taking so long to be effective and in common use. One plausible reason that hadn’t occurred to me until now is that no one’s been able to make them robust against prompt injection attacks. Reading an article (‘Agent hijacking: The true impact of prompt injection attacks’) today reminded me of just how hard it is to defend against that for an agent out in the wild.
Counterevidence: based on my experiences in startup-land and the industry’s track record with Internet of Things (‘IoT: the S stands for security!‘), I’d expect at least a couple of startups to be offering LLM agents anyway, ones that are useful but paper over the security issues, and I haven’t seen that as yet. A July Forbes article points to Mindy and Ario as the leaders in the ‘personal assistant’ category; I had never heard of either before, which makes me think they’re not useful enough to get serious traction.
To me, the natural explanation is that they were not trained for sequential decision making and therefore lose coherence rapidly when making long term plans. If I saw an easy patch I wouldn’t advertise it, but I don’t see any easy patch—I think next token prediction works surprisingly well at producing intelligent behavior in contrast to the poor scaling of RL in hard environments. The fact that it hasn’t spontaneously generalized to succeed at sequential decision making (RL style) tasks is in fact not surprising but would have seemed obvious to everyone if not for the many other abilities that did arise spontaneously.
It’s also due to LLMs just not being reliable enough for anything more than say 90% reliability, which is generally unacceptable in a lot of domains that have any lasting impact.
That definitely seems like part of the problem. Sholto Douglas and Trenton Bricken make that point pretty well in their discussion with Dwarkesh Patel from a while ago.
It’ll be interesting to see whether the process supervision approach that OpenAI are reputedly taking with ‘Strawberry’ will make a bit difference to that. It’s a different framing (rewarding good intermediate steps) but seems arguably equivalent.