Thanks for your post! I think it hits at the heart of the issue, and I like the term “tool AI” as a phrase that points to what we actually want to build. I share your fear of racing with China to build ASI.
As if this were not bad enough, it is now rather obvious that the first AGI will not be a pure LLM, but a hybrid scaffolded system.
I think scaffolding systems are great. Scaffolding tools allow inspecting how the agent is using the tools. For example, a scaffold could have a monitor that watches all of the calls to the code compiler to ensure the code doesn’t look suspicious. The same monitoring system could read all of the text the model commits to long-term memory and ensure it isn’t planning something bad or stenographically hiding information.
While it might be harder to formally “align” these systems, I think it will be easier to make safety cases for them for why they won’t “escape the data center” or “hack the missiles”. It will be much harder for the AI to do anything evil when we watch its every move. A superintelligence would likely still escape the scaffold, but it seems unlikely that an AGI could.
I think the scaffold systems could be the future of tool AI if we use them carefully. These hybrid systems will unlock new crazy and awesome capabilities for humans in the future without needing to solve the hard “alignment problem.” This does not
Thanks for your post! I think it hits at the heart of the issue, and I like the term “tool AI” as a phrase that points to what we actually want to build. I share your fear of racing with China to build ASI.
I think scaffolding systems are great. Scaffolding tools allow inspecting how the agent is using the tools. For example, a scaffold could have a monitor that watches all of the calls to the code compiler to ensure the code doesn’t look suspicious. The same monitoring system could read all of the text the model commits to long-term memory and ensure it isn’t planning something bad or stenographically hiding information.
While it might be harder to formally “align” these systems, I think it will be easier to make safety cases for them for why they won’t “escape the data center” or “hack the missiles”. It will be much harder for the AI to do anything evil when we watch its every move. A superintelligence would likely still escape the scaffold, but it seems unlikely that an AGI could.
I think the scaffold systems could be the future of tool AI if we use them carefully. These hybrid systems will unlock new crazy and awesome capabilities for humans in the future without needing to solve the hard “alignment problem.” This does not