This article poses questions on the distinction between Tool AGI and Agent AGI, which was described very concisely by Holden Karnofsky in his recent Thoughts on the Singularity Institute post:
In short, Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.
For me, this instantly raised one question: What if a Tool AGI becomes/is self-aware (which, for the purposes of this post, I define as “able to have goals that are distinct from the goals of the outside world”) and starts manipulating its results in a way that is non-obvious to its user? Or, even worse: What if the Tool AGI makes its user do things (which I do not expect to be much more difficult than succeding in the AI box experiment)?
My first reaction was to flinch away by telling myself: “But of course a Tool would never become self-aware! Self-awareness is too complex to just happen unintentionally!”
But some uncertainty survived and was strenghtened by Eliezer’s reply to Holden:
[Tool AGI] starts sounding much scarier once you try to say something more formal and internally-causal like “Model the user and the universe, predict the degree of correspondence between the user’s model and the universe, and select from among possible explanation-actions on this basis.”
After all, “Self-awareness is too complex to just happen unintentionally!” is just a bunch of English words expressing my personal incredulity. It’s not a valid argument.
So, can we make the argument, that self-awareness will not happen unintentionally?
If we can’t make that argument, can we stop Tool AGIs from potentially becoming a Weak Agent AGI which acts through its human user?
If we can’t do that, how meaningful is the distinction between a Weak Agent AGI (a.k.a. Tool AGI) and an Agent AGI?
For more, see the Tools versus Agents post by Stuart_Armstrong, which points to similar questions.
Tool/Agent distinction in the light of the AI box experiment
This article poses questions on the distinction between Tool AGI and Agent AGI, which was described very concisely by Holden Karnofsky in his recent Thoughts on the Singularity Institute post:
For me, this instantly raised one question: What if a Tool AGI becomes/is self-aware (which, for the purposes of this post, I define as “able to have goals that are distinct from the goals of the outside world”) and starts manipulating its results in a way that is non-obvious to its user? Or, even worse: What if the Tool AGI makes its user do things (which I do not expect to be much more difficult than succeding in the AI box experiment)?
My first reaction was to flinch away by telling myself: “But of course a Tool would never become self-aware! Self-awareness is too complex to just happen unintentionally!”
But some uncertainty survived and was strenghtened by Eliezer’s reply to Holden:
After all, “Self-awareness is too complex to just happen unintentionally!” is just a bunch of English words expressing my personal incredulity. It’s not a valid argument.
So, can we make the argument, that self-awareness will not happen unintentionally?
If we can’t make that argument, can we stop Tool AGIs from potentially becoming a Weak Agent AGI which acts through its human user?
If we can’t do that, how meaningful is the distinction between a Weak Agent AGI (a.k.a. Tool AGI) and an Agent AGI?
For more, see the Tools versus Agents post by Stuart_Armstrong, which points to similar questions.