Any sufficiently advanced tool is indistinguishable from [an] agent.
Let’s see if we can use concreteness to reason about this a little more thoroughly...
As I understand it, the nightmare looks something like this. I ask Google SuperMaps for the fastest route from NYC to Albany. It recognizes that computing this requires traffic information, so it diverts several self-driving cars to collect real-time data. Those cars run over pedestrians who were irrelevant to my query.
The obvious fix: forbid SuperMaps to alter anything outside of its own scratch data. It works with the data already gathered. Later a Google engineer might ask it what data would be more useful, or what courses of action might cheaply gather that data, but the engineer decides what if anything to actually do.
This superficially resembles a box, but there’s no actual box involved. The AI’s own code forbids plans like that.
But that’s for a question-answering tool. Let’s take another scenario:
I tell my super-intelligent car to take me to Albany as fast as possible. It sends emotionally manipulative emails to anyone else who would otherwise be on the road encouraging them to stay home.
I don’t see an obvious fix here.
So the short answer seems to be that it matters what the tool is for. A purely question-answering tool would be extremely useful, but not as useful as a general purpose one.
Could humans with a oracular super-AI police the development and deployment of active super-AIs?
I tell my super-intelligent car to take me to Albany as fast as possible. It sends emotionally manipulative emails to anyone else who would otherwise be on the road encouraging them to stay home.
I believe that HK’s post explicitly characterizes anything active like this as having agency.
I think the correct objection is something you can’t quite see in google maps. If you program an AI to do nothing but output directions, it will do nothing but output directions. If those directions are for driving, you’re probably fine. If those directions are big and complicated plans for something important, that you follow without really understanding why you’re doing (and this is where most of the benefits of working with an AGI will show up), then you could unknowingly take over the world using a sufficiently clever scheme.
Also note that it would be a lot easier for the AI to pull this off if you let it tell you how to improve its own design. If recursively self-improving AI blows other AI out of the water, then tool AI is probably not safe unless it is made ineffective.
This does actually seem like it would raise the bar of intelligence needed to take over the world somewhat. It is unclear how much. The topic seems to me to be worthy of further study/discussion, but not (at least not obviously) a threat to the core of SIAI’s mission.
If those directions are big and complicated plans for something important, that you follow without really understanding why you’re doing (and this is where most of the benefits of working with an AGI will show up), then you could unknowingly take over the world using a sufficiently clever scheme.
It also helps that Google Maps does not have general intelligence, so it does not include user’s reactions to its output, the consequent user’s actions in the real world, etc. as variables in its model, which may influence the quality of the solution, and therefore can (and should) be optimized (within constraints given by user’s psychology, etc.), if possible.
Shortly: Google Maps does not manipulate you, because it does not see you.
A generally smart Google Maps might not manipulate you, because it has no motivation to do so.
It’s hard to imagine how commercial services would work when they’re powered by GAI (e.g. if you asked a GAI version of Google Maps a question that’s unrelated to maps, e.g. “What’s a good recipe for Cheesecake?”, would it tell you that you should ask Google Search instead? Would it defer to Google Search and forward the answer to you? Would it just figure out the answer anyway, since it’s generally intelligent? Would the company Google simply collapse all services into a single “Google” brand, rather than have “Google Search”, “Google Mail”, “Google Maps”, etc, and have that single brand be powered by a single GAI? etc.) but let’s stick to the topic at hand and assume there’s a GAI named “Google Maps”, and you’re asking “How do I get to Albany?”
Given this use-case, would the engineers that developed the Google Maps GAI more likely give it a utility like “Maximize the probability that your response is truthful”, or is it more likely that the utility would be something closer to “Always respond with a set of directions which are legal in the relevant jurisdictions that they are to be followed within which, if followed by the user, would cause the user to arrive at the destination while minimizing cost/time/complexity (depending on the user’s preferences)”?
This was my thought as well: an automated vehicle is in “agent” mode.
The example also demonstrates why an AI in agent mode is likely to be more useful (in many cases) than an AI in tool mode. Compare using Google maps to find a route to the airport versus just jumping into a taxi cab and saying “Take me to the airport”. Since agent-mode AI has uses, it is likely to be developed.
I tell my super-intelligent car to take me to Albany as fast as possible. It sends emotionally manipulative emails to anyone else who would otherwise be on the road encouraging them to stay home.
Then it’s running in agent mode? My impression was that a tool-mode system presents you with a plan, but takes no actions. So all tool-mode systems are basically question-answering systems.
Perhaps we can meaningfully extend the distinction to some kinds of “semi-autonomous” tools, but that would be a different idea, wouldn’t it?
Then it’s running in agent mode? My impression was that a tool-mode system presents you with a plan, but takes no actions. So all tool-mode systems are basically question-answering systems.
I’m a sysadmin. When I want to get something done, I routinely come up with something that answers the question, and when it does that reliably I give it the power to do stuff on as little human input as possible. Often in daemon mode, to absolutely minimise how much it needs to bug me. Question-answerer->tool->agent is a natural progression just in process automation. (And this is why they’re called “daemons”.)
It’s only long experience and many errors that’s taught me how to do this such that the created agents won’t crap all over everything. Even then I still get surprises.
Well, do your ‘agents’ build a model of the world, fidelity of which they improve? I don’t think those really are agents in the AI sense, and definitely not in self improvement sense.
They may act according to various parameters they read in from the system environment. I expect they will be developed to a level of complication where they have something that could reasonably be termed a model of the world. The present approach is closer to perceptual control theory, where the sysadmin has the model and PCT is part of the implementation. ’Cos it’s more predictable to the mere human designer.
Capacity for self-improvement is an entirely different thing, and I can’t see a sysadmin wanting that—the sysadmin would run any such improvements themselves, one at a time. (Semi-automated code refactoring, for example.) The whole point is to automate processes the sysadmin already understands but doesn’t want to do by hand—any sysadmin’s job being to automate themselves out of the loop, because there’s always more work to do. (Because even in the future, nothing works.)
I would be unsurprised if someone markets a self-improving system for this purpose. For it to go FOOM, it also needs to invent new optimisations, which is presently a bit difficult.
Edit: And even a mere daemon-like automated tool can do stuff a lot of people regard as unFriendly, e.g.high frequency trading algorithms.
It’s not a natural progression in the sense of occurring without human intervention. That is rather relevant if the idea ofAI safety is going to be based on using tool AI strictly as tool AI.
Then it’s running in agent mode? My impression was that a tool-mode system presents you with a plan, but takes no actions. So all tool-mode systems are basically question-answering systems.
I’ve been assuming the definition from the article. I would agree that the term “tool AI” is unclear, but I would not agree that the definition in the article is unclear.
Let’s see if we can use concreteness to reason about this a little more thoroughly...
As I understand it, the nightmare looks something like this. I ask Google SuperMaps for the fastest route from NYC to Albany. It recognizes that computing this requires traffic information, so it diverts several self-driving cars to collect real-time data. Those cars run over pedestrians who were irrelevant to my query.
The obvious fix: forbid SuperMaps to alter anything outside of its own scratch data. It works with the data already gathered. Later a Google engineer might ask it what data would be more useful, or what courses of action might cheaply gather that data, but the engineer decides what if anything to actually do.
This superficially resembles a box, but there’s no actual box involved. The AI’s own code forbids plans like that.
But that’s for a question-answering tool. Let’s take another scenario:
I tell my super-intelligent car to take me to Albany as fast as possible. It sends emotionally manipulative emails to anyone else who would otherwise be on the road encouraging them to stay home.
I don’t see an obvious fix here.
So the short answer seems to be that it matters what the tool is for. A purely question-answering tool would be extremely useful, but not as useful as a general purpose one.
Could humans with a oracular super-AI police the development and deployment of active super-AIs?
I believe that HK’s post explicitly characterizes anything active like this as having agency.
I think the correct objection is something you can’t quite see in google maps. If you program an AI to do nothing but output directions, it will do nothing but output directions. If those directions are for driving, you’re probably fine. If those directions are big and complicated plans for something important, that you follow without really understanding why you’re doing (and this is where most of the benefits of working with an AGI will show up), then you could unknowingly take over the world using a sufficiently clever scheme.
Also note that it would be a lot easier for the AI to pull this off if you let it tell you how to improve its own design. If recursively self-improving AI blows other AI out of the water, then tool AI is probably not safe unless it is made ineffective.
This does actually seem like it would raise the bar of intelligence needed to take over the world somewhat. It is unclear how much. The topic seems to me to be worthy of further study/discussion, but not (at least not obviously) a threat to the core of SIAI’s mission.
It also helps that Google Maps does not have general intelligence, so it does not include user’s reactions to its output, the consequent user’s actions in the real world, etc. as variables in its model, which may influence the quality of the solution, and therefore can (and should) be optimized (within constraints given by user’s psychology, etc.), if possible.
Shortly: Google Maps does not manipulate you, because it does not see you.
A generally smart Google Maps might not manipulate you, because it has no motivation to do so.
It’s hard to imagine how commercial services would work when they’re powered by GAI (e.g. if you asked a GAI version of Google Maps a question that’s unrelated to maps, e.g. “What’s a good recipe for Cheesecake?”, would it tell you that you should ask Google Search instead? Would it defer to Google Search and forward the answer to you? Would it just figure out the answer anyway, since it’s generally intelligent? Would the company Google simply collapse all services into a single “Google” brand, rather than have “Google Search”, “Google Mail”, “Google Maps”, etc, and have that single brand be powered by a single GAI? etc.) but let’s stick to the topic at hand and assume there’s a GAI named “Google Maps”, and you’re asking “How do I get to Albany?”
Given this use-case, would the engineers that developed the Google Maps GAI more likely give it a utility like “Maximize the probability that your response is truthful”, or is it more likely that the utility would be something closer to “Always respond with a set of directions which are legal in the relevant jurisdictions that they are to be followed within which, if followed by the user, would cause the user to arrive at the destination while minimizing cost/time/complexity (depending on the user’s preferences)”?
This was my thought as well: an automated vehicle is in “agent” mode.
The example also demonstrates why an AI in agent mode is likely to be more useful (in many cases) than an AI in tool mode. Compare using Google maps to find a route to the airport versus just jumping into a taxi cab and saying “Take me to the airport”. Since agent-mode AI has uses, it is likely to be developed.
Then it’s running in agent mode? My impression was that a tool-mode system presents you with a plan, but takes no actions. So all tool-mode systems are basically question-answering systems.
Perhaps we can meaningfully extend the distinction to some kinds of “semi-autonomous” tools, but that would be a different idea, wouldn’t it?
(Edit) After reading more comments, “a different idea” which seems to match this kind of desire… http://lesswrong.com/lw/cbs/thoughts_on_the_singularity_institute_si/6jys
I’m a sysadmin. When I want to get something done, I routinely come up with something that answers the question, and when it does that reliably I give it the power to do stuff on as little human input as possible. Often in daemon mode, to absolutely minimise how much it needs to bug me. Question-answerer->tool->agent is a natural progression just in process automation. (And this is why they’re called “daemons”.)
It’s only long experience and many errors that’s taught me how to do this such that the created agents won’t crap all over everything. Even then I still get surprises.
Well, do your ‘agents’ build a model of the world, fidelity of which they improve? I don’t think those really are agents in the AI sense, and definitely not in self improvement sense.
They may act according to various parameters they read in from the system environment. I expect they will be developed to a level of complication where they have something that could reasonably be termed a model of the world. The present approach is closer to perceptual control theory, where the sysadmin has the model and PCT is part of the implementation. ’Cos it’s more predictable to the mere human designer.
Capacity for self-improvement is an entirely different thing, and I can’t see a sysadmin wanting that—the sysadmin would run any such improvements themselves, one at a time. (Semi-automated code refactoring, for example.) The whole point is to automate processes the sysadmin already understands but doesn’t want to do by hand—any sysadmin’s job being to automate themselves out of the loop, because there’s always more work to do. (Because even in the future, nothing works.)
I would be unsurprised if someone markets a self-improving system for this purpose. For it to go FOOM, it also needs to invent new optimisations, which is presently a bit difficult.
Edit: And even a mere daemon-like automated tool can do stuff a lot of people regard as unFriendly, e.g. high frequency trading algorithms.
It’s not a natural progression in the sense of occurring without human intervention. That is rather relevant if the idea ofAI safety is going to be based on using tool AI strictly as tool AI.
My own impression differs.
It becomes increasingly clear that “tool” in this context is sufficiently subject to different definitions that it’s not a particularly useful term.
I’ve been assuming the definition from the article. I would agree that the term “tool AI” is unclear, but I would not agree that the definition in the article is unclear.