Eliezer argued that looking at modern software does not support Holden’s claim that powerful tool AI is likely to come before dangerous agent AI. I’m not sure I think the examples he gave support his claim, especially if we broaden the “tool” concept in a way that seems consistent with Holden’s arguments. I’m not to sure about this, but I would like to hear reactions.
Eliezer:
At one point in his conversation with Tallinn, Holden argues that AI will inevitably be developed along planning-Oracle lines, because making suggestions to humans is the natural course that most software takes. Searching for counterexamples instead of positive examples makes it clear that most lines of code don’t do this. Your computer, when it reallocates RAM, doesn’t pop up a button asking you if it’s okay to reallocate RAM in such-and-such a fashion. Your car doesn’t pop up a suggestion when it wants to change the fuel mix or apply dynamic stability control. Factory robots don’t operate as human-worn bracelets whose blinking lights suggest motion. High-frequency trading programs execute stock orders on a microsecond timescale.
Whether this kind of software counts as agent-like software or tool software depends on what we mean by “tool.” Holden glosses the distinction as follows:
In short, Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.
Defined in this way, it seems that most of this software is neither agent-like software nor tool software. I suggested an alternative definition in another comment:
An agent models the world and selects actions in a way that depends on what its modeling says will happen if it selects a given action.
A tool may model the world, and may select actions depending on its modeling, but may not select actions in a way that depends on what its modeling says will happen if it selects a given action.
In this sense, I think all of Eliezer’s examples of software is tool-like rather than agent-like (qualification: I don’t know enough about the high-frequency trading stuff to say whether this is true there as well). I don’t see these examples as strong support for the view that agent-like AGI is the default outcome.
More Eliezer:
Software that does happen to interface with humans is selectively visible and salient to humans, especially the tiny part of the software that does the interfacing; but this is a special case of a general cost/benefit tradeoff which, more often than not, turns out to swing the other way, because human advice is either too costly or doesn’t provide enough benefit. Modern AI programmers are generally more interested in e.g. pushing the technological envelope to allow self-driving cars than to “just” do Google Maps.
It’s clearly right that software does a lot of things without getting explicit human approval, and there are control/efficiency tradeoffs that explain why this is so. However, I suspect that the self-driving cars are also not agents in Holden’s definition, or the one I proposed, and don’t give a lot of support to the view that AGI will be agent-like. All this should be taken since a grain of salt since I don’t too much about these cars. But I’m imagining these cars work by having a human select a place to go to, and then displaying a route, having the human accept the route, and then following a narrow set of rules to get the human there (e.g., stop if there’s a red light such and such distance in front of you, brake if there’s an object meeting such and such characteristics in your trajectory, etc.). I think the crucial thing here is the step where the human gets a helpful summary and then approves. That seems to fit my expansion of the “tool” concept, and seems to fit Holden’s picture in the most important way: this car isn’t going to do anything too crazy without our permission.
However, I can see an argument that advanced versions of this software would be changed to be more agent-like, in order to handle cases where the software has to decide what to do in split second situations that couldn’t have easily been described in advance, such as whether to make some emergency maneuver to avoid an infrequent sort of collision. Perhaps examples of this kind would become more abundant if we thought about it; high frequency trading sounds like a good potential case for this.
Branches of AI that invoke human aid, like hybrid chess-playing algorithms designed to incorporate human advice, are a field of study; but they’re the exception rather than the rule, and occur primarily where AIs can’t yet do something humans do, e.g. humans acting as oracles for theorem-provers, where the humans suggest a route to a proof and the AI actually follows that route.
Quick thought: If it’s hard to get AGIs to generate plans that people like, then it would seem that AGIs fall into this exception class, since in that case humans can do a better job of telling whether they like a given plan.
Factory robots and high-frequency traders are definitely agent AI. They are designed to be, and they frankly make no sense in any other way.
The factory robot does not ask you whether it should move three millimeters to the left; it does not suggest that perhaps moving three millimeters to the left would be wise; it moves three millimeters to the left, because that is what its code tells it to do at this phase in the welding process.
The high-frequency trader even has a utility function: It’s called profit, and it seeks out methods of trading options and derivatives to maximize that utility function.
In both cases, these are agents, because they act directly on the world itself, without a human intermediary approving their decisions.
The only reason I’d even hesitate to call them agent AIs is that they are so stupid; the factory robot has hardly any degrees of freedom at all, and the high-frequency trader only has choices between different types of financial securities (it never asks whether it should become an entrepreneur for instance). But this is a question of the AI part; they’re definitely agents rather than tools.
I do like your quick thought though:
Quick thought: If it’s hard to get AGIs to generate plans that people like, then it would seem that AGIs fall into this exception class, since in that case humans can do a better job of telling whether they like a given plan.
Yes, it makes a good deal of sense that we would want some human approval involved in the process of restructuring human society.
They’re clearly agents given Holden’s definitions. Why are they clearly agents given my proposed definition? (Normally I don’t see a point in arguing about definitions, but I think my proposed definition lines up with something of interest: things that are especially likely to become dangerous if they’re more powerful.)
Eliezer argued that looking at modern software does not support Holden’s claim that powerful tool AI is likely to come before dangerous agent AI. I’m not sure I think the examples he gave support his claim, especially if we broaden the “tool” concept in a way that seems consistent with Holden’s arguments. I’m not to sure about this, but I would like to hear reactions.
Eliezer:
Whether this kind of software counts as agent-like software or tool software depends on what we mean by “tool.” Holden glosses the distinction as follows:
Defined in this way, it seems that most of this software is neither agent-like software nor tool software. I suggested an alternative definition in another comment:
In this sense, I think all of Eliezer’s examples of software is tool-like rather than agent-like (qualification: I don’t know enough about the high-frequency trading stuff to say whether this is true there as well). I don’t see these examples as strong support for the view that agent-like AGI is the default outcome.
More Eliezer:
It’s clearly right that software does a lot of things without getting explicit human approval, and there are control/efficiency tradeoffs that explain why this is so. However, I suspect that the self-driving cars are also not agents in Holden’s definition, or the one I proposed, and don’t give a lot of support to the view that AGI will be agent-like. All this should be taken since a grain of salt since I don’t too much about these cars. But I’m imagining these cars work by having a human select a place to go to, and then displaying a route, having the human accept the route, and then following a narrow set of rules to get the human there (e.g., stop if there’s a red light such and such distance in front of you, brake if there’s an object meeting such and such characteristics in your trajectory, etc.). I think the crucial thing here is the step where the human gets a helpful summary and then approves. That seems to fit my expansion of the “tool” concept, and seems to fit Holden’s picture in the most important way: this car isn’t going to do anything too crazy without our permission.
However, I can see an argument that advanced versions of this software would be changed to be more agent-like, in order to handle cases where the software has to decide what to do in split second situations that couldn’t have easily been described in advance, such as whether to make some emergency maneuver to avoid an infrequent sort of collision. Perhaps examples of this kind would become more abundant if we thought about it; high frequency trading sounds like a good potential case for this.
Quick thought: If it’s hard to get AGIs to generate plans that people like, then it would seem that AGIs fall into this exception class, since in that case humans can do a better job of telling whether they like a given plan.
Factory robots and high-frequency traders are definitely agent AI. They are designed to be, and they frankly make no sense in any other way.
The factory robot does not ask you whether it should move three millimeters to the left; it does not suggest that perhaps moving three millimeters to the left would be wise; it moves three millimeters to the left, because that is what its code tells it to do at this phase in the welding process.
The high-frequency trader even has a utility function: It’s called profit, and it seeks out methods of trading options and derivatives to maximize that utility function.
In both cases, these are agents, because they act directly on the world itself, without a human intermediary approving their decisions.
The only reason I’d even hesitate to call them agent AIs is that they are so stupid; the factory robot has hardly any degrees of freedom at all, and the high-frequency trader only has choices between different types of financial securities (it never asks whether it should become an entrepreneur for instance). But this is a question of the AI part; they’re definitely agents rather than tools.
I do like your quick thought though:
Yes, it makes a good deal of sense that we would want some human approval involved in the process of restructuring human society.
They’re clearly agents given Holden’s definitions. Why are they clearly agents given my proposed definition? (Normally I don’t see a point in arguing about definitions, but I think my proposed definition lines up with something of interest: things that are especially likely to become dangerous if they’re more powerful.)