What do you make of the extensive arguments that tool AI are not actually safer than other forms of AI, and only look that way on the surface by ignoring issues of instrumental convergence to power-seeking and the capacity for tool AI to do extensive harm even if under human control? (See the Tool AI page for links to many posts tackling this question from different angles.)
(Also, for what it’s worth, I was with you until the Tool AI part. I would have liked this better if it had been split between one post arguing what’s wrong with entente and one post arguing what to do instead.)
Excellent question, Gordon! I defined tool AI specifically as controllable, so AI without a quantitative guarantee that it’s controllable (or “safe”, as you write) wouldn’t meet the safety standards and its release would be prohibited. I think it’s crucial that, just as for aviation and pharma, the onus is on the companies rather than the regulators to demonstrate that products meet the safety standards. For controllable tools with great potential for harm (say plastic explosives), we already have regulatory approaches for limiting who can use them and how. Analogously, there’s discussion at the UNGA this week about creating a treaty on lethal autonomous weapons, which I support.
I defined tool AI specifically as controllable, so AI without a quantitative guarantee that it’s controllable (or “safe”, as you write) wouldn’t meet the safety standards and its release would be prohibited.
If your stated definition is really all you mean by tool AI, then you’ve defined tool AI in a very nonstandard way that will confuse your readers.
When most people hear “tool AI”, I expect them to think of AI like hammers: tools they can use to help them achieve a goal, but aren’t agentic and won’t do anything on their own they weren’t directly asked to do.
You seem to have adopted a definition of “tool AI” that actually means “controllable and goal-achieving AI”, but give no consideration to agency, so I can only conclude from your writing that you would mean for AI agents to be included as tools, even if they operated independently, so long as they could be controlled in some sense (what sense control takes exactly you never specify). This is not what I expect most people to expect someone to mean by a “tool”.
Again, I like all the reasoning about entente, but this use of the word “tool AI” is confusing, maybe even deceptive (I assume that was not the intent!). It also leaves me felling like your “solution” of tool AI is nothing other than a rebrand of what we’ve already been talking about in the field variously as safe, aligned, or controllable AI, which I guess is fine, but “tool AI” is a confusing name for that. This also further downgrades my opinion of the solution section, since as best I can tell it’s just saying “build AI safely” without enough details to be actionable.
Even if tool AI is controllable, tool AI can be used to assist in building non-tool AI. A benign superassistant is one query away from outputting world-ending code.
Right, Tamsin: so reasonable safety standards would presumably ban fully unrestricted superassistants too, but allow more limited assistants that could still be incredibly helpful. I’m curious what AI safety standards you’d propose – it’s not a hypothetical question, since many politicians would like to know.
(Defining Tool AI as a program that would evaluate the answer to a question given available data without seeking to obtain any new data, and then shut down after having discovered the answer) While those arguments (if successful) argue that it’s harder to program a Tool AI than it might look at first, so AI alignment research is still something that should be actively researched (and I doubt Tegmark think AI alignment research is useless), they don’t really address the point that making aligned Tool AIs are still in some sense “inherently safer” than making Friendly AGI because the lack of a singleton scenario mean you don’t need to solve all moral and political philosophy from first principles in your garage in 5 years and hope you “get it right” the first time.
What do you make of the extensive arguments that tool AI are not actually safer than other forms of AI, and only look that way on the surface by ignoring issues of instrumental convergence to power-seeking and the capacity for tool AI to do extensive harm even if under human control? (See the Tool AI page for links to many posts tackling this question from different angles.)
(Also, for what it’s worth, I was with you until the Tool AI part. I would have liked this better if it had been split between one post arguing what’s wrong with entente and one post arguing what to do instead.)
Excellent question, Gordon! I defined tool AI specifically as controllable, so AI without a quantitative guarantee that it’s controllable (or “safe”, as you write) wouldn’t meet the safety standards and its release would be prohibited. I think it’s crucial that, just as for aviation and pharma, the onus is on the companies rather than the regulators to demonstrate that products meet the safety standards. For controllable tools with great potential for harm (say plastic explosives), we already have regulatory approaches for limiting who can use them and how. Analogously, there’s discussion at the UNGA this week about creating a treaty on lethal autonomous weapons, which I support.
If your stated definition is really all you mean by tool AI, then you’ve defined tool AI in a very nonstandard way that will confuse your readers.
When most people hear “tool AI”, I expect them to think of AI like hammers: tools they can use to help them achieve a goal, but aren’t agentic and won’t do anything on their own they weren’t directly asked to do.
You seem to have adopted a definition of “tool AI” that actually means “controllable and goal-achieving AI”, but give no consideration to agency, so I can only conclude from your writing that you would mean for AI agents to be included as tools, even if they operated independently, so long as they could be controlled in some sense (what sense control takes exactly you never specify). This is not what I expect most people to expect someone to mean by a “tool”.
Again, I like all the reasoning about entente, but this use of the word “tool AI” is confusing, maybe even deceptive (I assume that was not the intent!). It also leaves me felling like your “solution” of tool AI is nothing other than a rebrand of what we’ve already been talking about in the field variously as safe, aligned, or controllable AI, which I guess is fine, but “tool AI” is a confusing name for that. This also further downgrades my opinion of the solution section, since as best I can tell it’s just saying “build AI safely” without enough details to be actionable.
Even if tool AI is controllable, tool AI can be used to assist in building non-tool AI. A benign superassistant is one query away from outputting world-ending code.
Right, Tamsin: so reasonable safety standards would presumably ban fully unrestricted superassistants too, but allow more limited assistants that could still be incredibly helpful. I’m curious what AI safety standards you’d propose – it’s not a hypothetical question, since many politicians would like to know.
(Defining Tool AI as a program that would evaluate the answer to a question given available data without seeking to obtain any new data, and then shut down after having discovered the answer) While those arguments (if successful) argue that it’s harder to program a Tool AI than it might look at first, so AI alignment research is still something that should be actively researched (and I doubt Tegmark think AI alignment research is useless), they don’t really address the point that making aligned Tool AIs are still in some sense “inherently safer” than making Friendly AGI because the lack of a singleton scenario mean you don’t need to solve all moral and political philosophy from first principles in your garage in 5 years and hope you “get it right” the first time.