This seems similar to Role Architectures, perhaps with the roles broken down into even more specialized functions.
I suspect lots of groups are already working on building these kinds of agents as fast as possible, for both research and commercial purposes. LangChain just raised a $10m seed round.
I’m less optimistic that such constructions will necessarily be safe. One reason is that these kinds of agents are very easy to build, especially compared to the work required to train the underlying foundation models—a small team or even a single programmer can put together a pretty capable agent using the OpenAI API and LangChain. These agents seem like they have the potential to exhibit discontinuous jumps in capabilities when backed by more powerful foundation models, more actions, or improvements in their architecture. Small teams capable of making discontinuous progress seems like a recipe for disaster.
I agree. Do you know of any existing safety research of such architectures? It seems that aligning these types of systems can pose completely different challenges than aligning LLMs in general.
This seems similar to Role Architectures, perhaps with the roles broken down into even more specialized functions.
I suspect lots of groups are already working on building these kinds of agents as fast as possible, for both research and commercial purposes. LangChain just raised a $10m seed round.
I’m less optimistic that such constructions will necessarily be safe. One reason is that these kinds of agents are very easy to build, especially compared to the work required to train the underlying foundation models—a small team or even a single programmer can put together a pretty capable agent using the OpenAI API and LangChain. These agents seem like they have the potential to exhibit discontinuous jumps in capabilities when backed by more powerful foundation models, more actions, or improvements in their architecture. Small teams capable of making discontinuous progress seems like a recipe for disaster.
I agree. Do you know of any existing safety research of such architectures? It seems that aligning these types of systems can pose completely different challenges than aligning LLMs in general.