If my comment here correctly captures what is meant by “tool mode” and “agent mode”, then it seems to follow that AGI running in tool mode is no safer than the person using it.
If that’s the case, then an AGI running in tool mode is safer than an AGI running in agent mode if and only if agent mode is less trustworthy than whatever person ends up using the tool.
What you presented there (and here) is another theorem, something that should be proved (and published, if it hasn’t been yet). If true, this gives an estimate on how dangerous a non-agent AGI can be. And yes, since we have had a lot of time study people and no time at all to study AGI, I am guessing that an AGI is potentially much more dangerous, because so little is known. Or at least that seems to be the whole point of the goal of developing provably friendly AI.
What you presented there (and here) is another theorem
What? It sounds like a common-sensical¹ statement about tools in general and human nature, but not at all like something which could feasibly be expressed in mathematical form.
No, because a person using a dangerous tool is still just a person, with limited speed of cognition, limited lifespan, and no capacity for unlimited self-modification.
A crazy dictator with a super-capable tool AI that tells him the best strategy to take over the world is still susceptible to assassination, and his plan no matter how clever cannot unfold faster than his victims are able to notice and react to it.
I suspect a crazy dictator with a super-capable tool AI would have unusually good counter-assassination plans, simplified by the reduced need for human advisors and managers of imperfect loyalty. Likewise, a medical expert system could provide gains to lifespan, particularly if it were backed up by the resources a paranoid megalomaniac in control of a small country would be willing to throw at a major threat.
My understanding of a supercapable tool AI is one that takes over the world if a crazy dictator directs it to, just like my understanding of a can opener tool is one that opens a can at my direction, rather than one that gives me directions on how to open a can.
Presumably it also augments the dictator’s lifespan, cognition, etc. if she asks, insofar as it’s capable of doing so.
More generally, my understanding of these concepts is that the only capability that a tool AI lacks that an agent AI has is the capability of choosing goals to implement. So, if we’re assuming that an agent AI would be capable of unlimited self-modification in pursuit of its own goals, I conclude that a corresponding tool AI is capable of unlimited self-modification in pursuit of its agent’s goals. It follows that assuming that a tool AI is not capable of augmenting its human agent in accordance with its human agent’s direction is not safe.
(I should note that I consider a capacity for unlimited self-improvement relatively unlikely, for both tool and agent AIs. But that’s beside my point here.)
Agreed that a crazy dictator with a tool that will take over the world for her is safer than an agent capable of taking over the world, if only because the possibility exists that the tool can be taken away from her and repurposed, and it might not occur to her to instruct it to prevent anyone else from taking it or using it.
I stand by my statement that such a tool is no safer than the dictator herself, and that an AGI running in such a tool mode is safer than that AGI running in agent mode only if the agent mode is less trustworthy than the crazy dictator.
If my comment here correctly captures what is meant by “tool mode” and “agent mode”, then it seems to follow that AGI running in tool mode is no safer than the person using it.
If that’s the case, then an AGI running in tool mode is safer than an AGI running in agent mode if and only if agent mode is less trustworthy than whatever person ends up using the tool.
Are you assuming that’s true?
What you presented there (and here) is another theorem, something that should be proved (and published, if it hasn’t been yet). If true, this gives an estimate on how dangerous a non-agent AGI can be. And yes, since we have had a lot of time study people and no time at all to study AGI, I am guessing that an AGI is potentially much more dangerous, because so little is known. Or at least that seems to be the whole point of the goal of developing provably friendly AI.
What? It sounds like a common-sensical¹ statement about tools in general and human nature, but not at all like something which could feasibly be expressed in mathematical form.
Footnote:
This doesn’t mean it’s necessarily true, though.
No, because a person using a dangerous tool is still just a person, with limited speed of cognition, limited lifespan, and no capacity for unlimited self-modification.
A crazy dictator with a super-capable tool AI that tells him the best strategy to take over the world is still susceptible to assassination, and his plan no matter how clever cannot unfold faster than his victims are able to notice and react to it.
I suspect a crazy dictator with a super-capable tool AI would have unusually good counter-assassination plans, simplified by the reduced need for human advisors and managers of imperfect loyalty. Likewise, a medical expert system could provide gains to lifespan, particularly if it were backed up by the resources a paranoid megalomaniac in control of a small country would be willing to throw at a major threat.
Tool != Oracle.
At least, not my my understanding of tool.
My understanding of a supercapable tool AI is one that takes over the world if a crazy dictator directs it to, just like my understanding of a can opener tool is one that opens a can at my direction, rather than one that gives me directions on how to open a can.
Presumably it also augments the dictator’s lifespan, cognition, etc. if she asks, insofar as it’s capable of doing so.
More generally, my understanding of these concepts is that the only capability that a tool AI lacks that an agent AI has is the capability of choosing goals to implement. So, if we’re assuming that an agent AI would be capable of unlimited self-modification in pursuit of its own goals, I conclude that a corresponding tool AI is capable of unlimited self-modification in pursuit of its agent’s goals. It follows that assuming that a tool AI is not capable of augmenting its human agent in accordance with its human agent’s direction is not safe.
(I should note that I consider a capacity for unlimited self-improvement relatively unlikely, for both tool and agent AIs. But that’s beside my point here.)
Agreed that a crazy dictator with a tool that will take over the world for her is safer than an agent capable of taking over the world, if only because the possibility exists that the tool can be taken away from her and repurposed, and it might not occur to her to instruct it to prevent anyone else from taking it or using it.
I stand by my statement that such a tool is no safer than the dictator herself, and that an AGI running in such a tool mode is safer than that AGI running in agent mode only if the agent mode is less trustworthy than the crazy dictator.
This seems to propose an alternate notion of ‘tool’ than the one in the article.
I agree with “tool != oracle” for the article’s definition.
Using your definition, I’m not sure there is any distinction between tool and agent at all, as per this comment.
I do think there are useful alternative notions to consider in this area, though, as per this comment.
And I do think there is a terminology issue. Previously I was saying “autonomous AI” vs “non-autonomous”.