What if a Tool AGI becomes/is self-aware and starts manipulating its results in a way that is non-obvious to its user?
It helps if you first define what “self-aware” means. Which means that you should probably first define what “aware” means, and then how this “self” concept fits into it. LW has a number of relevant posts on this. Here is one at random.
As EY repeatedly mentioned, any current complicated non-general AI (like the Deep Blue chess player) already does non-obvious things when solving a problem, yet such AIs are not inherently risky.
What if the Tool AGI makes its user do things
Your calculator makes you do things when you use it, it makes you use its answer for whatever you need it for. If you happen to misplace a bracket or forget the order of operations, it might make you fail a test. Yet you would not be afraid of a calculator agentizing through self-awareness.
So, can we make the argument, that awareness will not happen unintentionally?
I hope that you now agree that this question is not a useful one to discuss, until you define at least some specific dangers of awareness that are not present in a non-aware algorithm. And my guess is that you will not be able to.
I haven’t spent several years studying philosophy, so defining “self” and “awareness” is probably not something I should do – nor is that necessary. All I assume in the original post is that self-awareness includes being able to have goals that are distinct from the goals of the outside world.
Deep Blue runs software whose “goal” is the goal its developers have worked on: Choose the best move in a game of chess. Deep Blue does (for all we know) not run an AGI which thinks: “Okay, my real goal is X, but as long as I haven’t calculated what I need to do to reach X, I should just act as if I were a normal chess application and calculate the next move as my programmers expect me to do.”
It helps if you first define what “self-aware” means. Which means that you should probably first define what “aware” means, and then how this “self” concept fits into it. LW has a number of relevant posts on this. Here is one at random.
As EY repeatedly mentioned, any current complicated non-general AI (like the Deep Blue chess player) already does non-obvious things when solving a problem, yet such AIs are not inherently risky.
Your calculator makes you do things when you use it, it makes you use its answer for whatever you need it for. If you happen to misplace a bracket or forget the order of operations, it might make you fail a test. Yet you would not be afraid of a calculator agentizing through self-awareness.
I hope that you now agree that this question is not a useful one to discuss, until you define at least some specific dangers of awareness that are not present in a non-aware algorithm. And my guess is that you will not be able to.
I haven’t spent several years studying philosophy, so defining “self” and “awareness” is probably not something I should do – nor is that necessary. All I assume in the original post is that self-awareness includes being able to have goals that are distinct from the goals of the outside world.
Deep Blue runs software whose “goal” is the goal its developers have worked on: Choose the best move in a game of chess. Deep Blue does (for all we know) not run an AGI which thinks: “Okay, my real goal is X, but as long as I haven’t calculated what I need to do to reach X, I should just act as if I were a normal chess application and calculate the next move as my programmers expect me to do.”
I’m not using “Y makes me do things” as a synonym for “I should do things using Y in order to reach my goal.” I’m using it as a synonym for “Y can execute arbitrary code in my brain.” Remember: “This is a transhuman mind we’re talking about. If it thinks both faster and better than a human, it can probably take over a human mind through a text-only terminal.”