We may need another word for “agent with intentionality”—the way the word “agent” is conventionally used is closer to “daemon”, i.e. tool set to run without user intervention.
I’m not sure even having a world-model is a relevant distinction—I fully expect sysadmin tools to be designed to form something that could reasonably be called a world model within my working lifetime (which means I’d be amazed if they don’t exist now). A moderately complex Puppet-run system can already be a bit spooky.
Note that mere daemon-level tools exist that many already consider unFriendly, e.g. high-frequency trading systems.
The Roomba cleaning robot is scarcely an agent. While running, it does not build up a model of the world; it only responds to immediate stimuli (collisions, cliff detection, etc.) and generates a range of preset behaviors, some of them random.
It has some senses about itself — it can detect a jammed wheel, and the “smarter” ones will return to dock to recharge if the battery is low, then resume cleaning. But it does not have a variable anywhere in its memory that indicates how clean it believes the room is — an explicit representation of a utility function of cleanliness, or “how well it has done at its job”. It does, however, have a sensor for how dirty the carpet immediately below it is, and it will spend extra time on cleaning especially dirty patches.
Because it does not have beliefs about how clean the room is, it can’t have erroneous beliefs about that either — it can’t become falsely convinced that it has finished its job when it hasn’t. It just keeps sweeping until it runs out of power. (We can imagine a paperclip-robot that doesn’t think about paperclips; it just goes around finding wire and folding it. It cannot be satisfied, because it doesn’t even have a term for “enough paperclips”!)
It is scarcely an agent. To me it seems even less “agenty” than an arbitrage daemon, but that probably has more to do with the fact that it’s not designed to interact with other agents. But you can set it on the floor and push the go button, and in an hour come back to a cleaner floor. It doesn’t think it’s optimizing anything, but its behavior has the result of being useful for optimizing something.
Whether an entity builds up a model of the world, or is self-aware or self-protecting, is to some extent an implementation detail, which is different from the question of whether we want to live around the consequences of that entity’s actions.
The agent/tool distinction is in the map, not the territory — it’s a matter of adopting the intentional stance toward whatever entity we’re talking about. To some extent, saying “agent” means treating the entity as a black box with a utility function printed on the outside: “the print spooler wants to send all the documents to the printer” — or “this Puppet config is trying to put the servers in such-and-so state …”
My roomba does not just keep sweeping until it runs out of power. It terminates quickly in a small space and terminates slower in a large space. To terminate it must somehow sense the size of the space it is working in and compare it to some register of how long it has operated.
It’s very hard to avoid apparent teleology when speaking in English. (This is particularly troublesome when talking about evolution by natural selection, where the assumption of teleology is the number one barrier to comprehending how it actually works.)
To think about it, I think what we need is understanding of enormous gap between the software we design when we have some intent in mind, and fulfilling that intent itself.
For example, if I have intent to get from point A to point B on terrain, I could build a solution consisting of 2 major parts:
the perceiver tool that builds and updates the map of terrain
the solver tool that minimizes some parameter over a path through this terrain (some discomfort metric, combined with the time, the risk of death, etc). [edit: please note that this terrain is not the real world terrain]
A philosopher thinking about it could think up a mover-from-point-A-to-point-B which directly implements my ‘wish’ to get from A to B. It will almost certainly expose me to non-survivable accelerations, or worse yet, destroy buildings on its path (because in the wish i forgot to tell not to). That is because when you employ verbal reasoning you are thinking directly starting from intent.
Alas, we do not know how to reduce intent to something that is not made of intent.
edit: that is, in the mind of philosopher, the thing is actually minimizing—or maximizing—some metric along a real world path. We don’t know how to do that. We do not know how we do that. We don’t even know we actually do that ourselves.
edit: We might figure out how to do that, but it is separate problem from either improvements to my first bullet point, or my second bullet point.
Other thing that I forgot to mention: the relationship between the solver and the model is inherently different than the relationship between self driving car and the world. The solver has god’s eye view, that works in high level terms. The car looks through sensors. The solver can not be directly connected to the real world, or even to a reductionist detailed physics simulator (too hard to define the comfortable path when the car, too, is made of atoms).
Note that mere daemon-level tools exist that many already consider unFriendly, e.g. high-frequency trading systems.
There’s the AK47 rifle: you pull trigger, the bullets come out one after another...
But to someone from muzzle-loader times, AK47 would look rather daemon-like, it auto reloads and fires… and to someone with a self driving battle tank squad that runs itself using an AI from Starcraft or something, the motion sensing turret is just another land mine.
True. It’s a gradient, not entirely discrete. C was once a high-level language, now it’s a portable assembler. Tools get ridiculously more sophisticated as we ride up Moore’s Law, while still being creatures of computer science that instantiate discrete mathematics.
As I said over in that other thread, a necessary (though not sufficient, I think) difference between “daemon” and “independent agent” will be the optimisation of thinking up new optimisations. I would expect that compiler writers are all over this stuff already and that there’s a considerable body of work on the subject.
And then there’s deciding if a lossy optimisation will do the job, which is where as a sysadmin I would not be entirely comfortable with my tools doing this unsupervised. (loose analogy: I know I can’t tell a 320kbps MP3 from a 24⁄96 FLAC, but it took ten years of A/B testing on humans for MP3 encoders not to suck.)
Hmm, in my view it is more of a goal distinction than abilities distinction.
The model popular here is that of ‘expected utility maximizer’, and the ‘utility function’ is defined on the real world. Then the agent does want to build most accurate model of the real world, to be able to maximize that function the best, and the agent tries to avoid corruption of the function, etc. It also wants it’s outputs to affect the world, and if put in a box, will try to craft output to do things in the real world even if you only wanted to look at them.
This is all very ontologically basic to humans. We easily philosophize about such stuff.
Meanwhile, we don’t know how to do that. We don’t know how to reduce that world ‘utility’ to elementary operations performed on the sensory input (neither directly nor on meta level). The current solution involves making some part that creates/updates mathematically defined problem that other part finds mathematical solutions to, and then the solutions are shown if it is a tool or get applied to the real world if it isn’t. The wisdom of applying those solutions to the real world is an entirely separate issue. The point is that the latter works like a tool if boxed, not like a caged animal (or a caged human).
edit: another problem i think is that many of the ‘difficulty of friendliness’ arguments are just special cases of general ‘difficulty of world intentionality’.
The model popular here is that of ‘expected utility maximizer’, and the ‘utility function’ is defined on the real world.
I think this is a bit of a misperception stemming from the use of the “paperclip maximizer” example to illustrate things about instrumental reasoning. Certainly folk like Eliezer or Wei Dai or Stuart Armstrong or Paul Christiano have often talked about how a paperclip maximizer is much of the way to FAI (in having a world-model robust enough to use consequentialism). Note that people also like to use the AIXI framework as a model, and use it to talk about how AIXI is set up not as a paperclip maximizer but a wireheader (pornography and birth control rather than sex and offspring), with its utility function defined over sensory inputs rather than a model of the external world.
For another example, when talking about the idea of creating an AI with some external reward that can be administered by humans but not as easily hacked/wireheaded by the AI itself people use the example of an AI designed to seek factors of certain specified numbers, or a proof or disproof of the Riemann hypothesis according to some internal proof-checking mechanism, etc, recognizing the role of wireheading and the difficulty of specifying goals externally rather than using simple percepts and the like.
We may need another word for “agent with intentionality”—the way the word “agent” is conventionally used is closer to “daemon”, i.e. tool set to run without user intervention.
I’m not sure even having a world-model is a relevant distinction—I fully expect sysadmin tools to be designed to form something that could reasonably be called a world model within my working lifetime (which means I’d be amazed if they don’t exist now). A moderately complex Puppet-run system can already be a bit spooky.
Note that mere daemon-level tools exist that many already consider unFriendly, e.g. high-frequency trading systems.
A more mundane example:
The Roomba cleaning robot is scarcely an agent. While running, it does not build up a model of the world; it only responds to immediate stimuli (collisions, cliff detection, etc.) and generates a range of preset behaviors, some of them random.
It has some senses about itself — it can detect a jammed wheel, and the “smarter” ones will return to dock to recharge if the battery is low, then resume cleaning. But it does not have a variable anywhere in its memory that indicates how clean it believes the room is — an explicit representation of a utility function of cleanliness, or “how well it has done at its job”. It does, however, have a sensor for how dirty the carpet immediately below it is, and it will spend extra time on cleaning especially dirty patches.
Because it does not have beliefs about how clean the room is, it can’t have erroneous beliefs about that either — it can’t become falsely convinced that it has finished its job when it hasn’t. It just keeps sweeping until it runs out of power. (We can imagine a paperclip-robot that doesn’t think about paperclips; it just goes around finding wire and folding it. It cannot be satisfied, because it doesn’t even have a term for “enough paperclips”!)
It is scarcely an agent. To me it seems even less “agenty” than an arbitrage daemon, but that probably has more to do with the fact that it’s not designed to interact with other agents. But you can set it on the floor and push the go button, and in an hour come back to a cleaner floor. It doesn’t think it’s optimizing anything, but its behavior has the result of being useful for optimizing something.
Whether an entity builds up a model of the world, or is self-aware or self-protecting, is to some extent an implementation detail, which is different from the question of whether we want to live around the consequences of that entity’s actions.
The agent/tool distinction is in the map, not the territory — it’s a matter of adopting the intentional stance toward whatever entity we’re talking about. To some extent, saying “agent” means treating the entity as a black box with a utility function printed on the outside: “the print spooler wants to send all the documents to the printer” — or “this Puppet config is trying to put the servers in such-and-so state …”
My roomba does not just keep sweeping until it runs out of power. It terminates quickly in a small space and terminates slower in a large space. To terminate it must somehow sense the size of the space it is working in and compare it to some register of how long it has operated.
Roombas try to build up a (very limited) model of how big the room is from the longest uninterrrupted traversal it can sense. See “Can you tell me more about the cleaning algorithm that the Roomba uses?” in http://www.botjunkie.com/2010/05/17/botjunkie-interview-nancy-dussault-smith-on-irobots-roomba/
Oh, cool.
*updates*
It’s very hard to avoid apparent teleology when speaking in English. (This is particularly troublesome when talking about evolution by natural selection, where the assumption of teleology is the number one barrier to comprehending how it actually works.)
Very good point on need for another word.
To think about it, I think what we need is understanding of enormous gap between the software we design when we have some intent in mind, and fulfilling that intent itself.
For example, if I have intent to get from point A to point B on terrain, I could build a solution consisting of 2 major parts:
the perceiver tool that builds and updates the map of terrain
the solver tool that minimizes some parameter over a path through this terrain (some discomfort metric, combined with the time, the risk of death, etc). [edit: please note that this terrain is not the real world terrain]
A philosopher thinking about it could think up a mover-from-point-A-to-point-B which directly implements my ‘wish’ to get from A to B. It will almost certainly expose me to non-survivable accelerations, or worse yet, destroy buildings on its path (because in the wish i forgot to tell not to). That is because when you employ verbal reasoning you are thinking directly starting from intent.
Alas, we do not know how to reduce intent to something that is not made of intent.
edit: that is, in the mind of philosopher, the thing is actually minimizing—or maximizing—some metric along a real world path. We don’t know how to do that. We do not know how we do that. We don’t even know we actually do that ourselves.
edit: We might figure out how to do that, but it is separate problem from either improvements to my first bullet point, or my second bullet point.
Other thing that I forgot to mention: the relationship between the solver and the model is inherently different than the relationship between self driving car and the world. The solver has god’s eye view, that works in high level terms. The car looks through sensors. The solver can not be directly connected to the real world, or even to a reductionist detailed physics simulator (too hard to define the comfortable path when the car, too, is made of atoms).
There’s the AK47 rifle: you pull trigger, the bullets come out one after another...
heh. An AK-47 is more on the “tool” level. That variety of unfriendliness on the “daemon” level would be an automated motion-sensing gun turret.
But to someone from muzzle-loader times, AK47 would look rather daemon-like, it auto reloads and fires… and to someone with a self driving battle tank squad that runs itself using an AI from Starcraft or something, the motion sensing turret is just another land mine.
True. It’s a gradient, not entirely discrete. C was once a high-level language, now it’s a portable assembler. Tools get ridiculously more sophisticated as we ride up Moore’s Law, while still being creatures of computer science that instantiate discrete mathematics.
As I said over in that other thread, a necessary (though not sufficient, I think) difference between “daemon” and “independent agent” will be the optimisation of thinking up new optimisations. I would expect that compiler writers are all over this stuff already and that there’s a considerable body of work on the subject.
And then there’s deciding if a lossy optimisation will do the job, which is where as a sysadmin I would not be entirely comfortable with my tools doing this unsupervised. (loose analogy: I know I can’t tell a 320kbps MP3 from a 24⁄96 FLAC, but it took ten years of A/B testing on humans for MP3 encoders not to suck.)
Hmm, in my view it is more of a goal distinction than abilities distinction.
The model popular here is that of ‘expected utility maximizer’, and the ‘utility function’ is defined on the real world. Then the agent does want to build most accurate model of the real world, to be able to maximize that function the best, and the agent tries to avoid corruption of the function, etc. It also wants it’s outputs to affect the world, and if put in a box, will try to craft output to do things in the real world even if you only wanted to look at them.
This is all very ontologically basic to humans. We easily philosophize about such stuff.
Meanwhile, we don’t know how to do that. We don’t know how to reduce that world ‘utility’ to elementary operations performed on the sensory input (neither directly nor on meta level). The current solution involves making some part that creates/updates mathematically defined problem that other part finds mathematical solutions to, and then the solutions are shown if it is a tool or get applied to the real world if it isn’t. The wisdom of applying those solutions to the real world is an entirely separate issue. The point is that the latter works like a tool if boxed, not like a caged animal (or a caged human).
edit: another problem i think is that many of the ‘difficulty of friendliness’ arguments are just special cases of general ‘difficulty of world intentionality’.
I think this is a bit of a misperception stemming from the use of the “paperclip maximizer” example to illustrate things about instrumental reasoning. Certainly folk like Eliezer or Wei Dai or Stuart Armstrong or Paul Christiano have often talked about how a paperclip maximizer is much of the way to FAI (in having a world-model robust enough to use consequentialism). Note that people also like to use the AIXI framework as a model, and use it to talk about how AIXI is set up not as a paperclip maximizer but a wireheader (pornography and birth control rather than sex and offspring), with its utility function defined over sensory inputs rather than a model of the external world.
For another example, when talking about the idea of creating an AI with some external reward that can be administered by humans but not as easily hacked/wireheaded by the AI itself people use the example of an AI designed to seek factors of certain specified numbers, or a proof or disproof of the Riemann hypothesis according to some internal proof-checking mechanism, etc, recognizing the role of wireheading and the difficulty of specifying goals externally rather than using simple percepts and the like.