If the AI’s map represents the territory accurately enough, the AI can use the map to check the consequences of returning different actions, then pick one action and return it, ipso facto affecting the territory. I think I already know how to build a working paperclipper in a Game of Life universe, and it doesn’t seem to wirehead itself. Do you have a strong argument why all non-magical real-world AIs will wirehead themselves before they get a chance to hurt humans?
Fair enough. We can handwave a little and say that AI2 built by AI1 might be able to sense things and self-modify, but this offloading of the whole problem to AI1 is not really satisfying. We’d like to understand exactly how AIs should sense and self-modify, and right now we don’t.
But the new machine can’t self-modify. My point is about the limitations of cousin_it’s example. The machine has a completely accurate model of the world as input and uses an extremely inefficient algorithm to find a way to paperclip the world.
Perhaps it’s also worth bringing up the example of controllers, which don’t wirehead (or do they, once sufficiently complex?) and do optimize the real world. (Thermostats confuse me. Do they have intentionality despite lacking explicit representations? (FWIW Searle told me the answer was no because of something about consciousness, but I’m not sure how seriously he considered my question.))
Yes, actual thermostats got their shard of the Void from humans, just as humans got their shard of the Void from evolution. (I’d say “God” and not “the Void”, but whatever.) But does evolution have intentionality? The point is to determine whether or not intentionality is fundamentally different from seemingly-simpler kinds of optimization—and if it’s not, then why does symbol grounding seem like such a difficult problem? …Or something, my brain is too stressed to actually think.
I don’t see why it doesn’t seem to wirehead itself, unless for some reason the game of life manipulators are too clumsy to send a glider to achieve the goal by altering the value within the paperclipper (e.g. within it’s map). Ultimately the issue is that the goal is achieved when some cells within paperclipper which define the goal acquire certain values. You need to have rather specific action generator so that it avoids generating the action that changes the cells within paperclipper. Can you explain why this solution would not be arrived at? Can your paperclipper then self improve if it can’t self modify?
I do imagine that very laboriously you can manage to define some sort of paperclipping goal (maximize number of live cells?), on the AI into which you, by hand, hard coded complete understanding of game of life, and you might be able to make it not recognize sending of the glider into the goal system and changing it as ‘goal accomplished’. The issue is not whenever it’s possible (I can make a battery of self replicating glider guns and proclaim them to be an AI), the issue is whenever it is at all likely to happen without immense lot of work implementing much of the stuff that the AI ought to learn, into the AI, by hand. Ultimately with no role for AI’s intelligence as intelligence amplifier, but only as obstacle that gets in your way.
Furthermore, keep in mind that the AI’s model of game of life universe is incomplete. The map does not represent territory accurately enough, and can not, as the AI occupies only a small fraction of the universe, and encodes the universe into itself very inefficiently.
The paperclipper’s goal is not to modify the map in a specific way, but to fill the return value register with a value that obeys specific constraints. (Or to zoom in even further, the paperclipper doesn’t even have a fundamental “goal”. The paperclipper just enumerates different values until it finds one that fits the constraints. When a value is found, it gets written to the register, and the program halts. That’s all the program does.) After that value ends up in the register, it causes ripples in the world, because the register is physically connected to actuators or something, which were also described in the paperclipper’s map. If the value indeed obeys the constraints, the ripples in the world will lead to creating many paperclips.
Not sure what sending gliders has to do with the topic. We’re talking about the paperclipper wireheading itself, not the game manipulators trying to wirehead the paperclipper.
Incompleteness of the model, self-modification and other issues seem to be red herrings. If we have a simple model where wireheading doesn’t happen, why should we believe that wireheading will necessarily happen in more complex models? I think a more formal argument is needed here.
You don’t have simple model where wireheading doesn’t happen, you have the model where you didn’t see how the wireheading would happen by the paperclipper, erhm, touching itself (i.e. it’s own map) with it’s manipulators, satisfying the condition without filling universe with paperclips.
edit: that is to say, the agent which doesn’t internally screw up it’s model, can still e.g. dissolve the coat off a ram chip and attach a wire there, or failing that, produce the fake input for it’s own senses (which we do a whole lot).
Maybe you misunderstood the post. The paperclipper in the post first spends some time thinking without outputting any actions, then it outputs one single action and halts, after which any changes to the map are irrelevant.
We don’t have many models of AIs that output multiple successive actions, but one possible model is to have a one-action AI whose action is to construct a successor AI. In this case the first AI doesn’t wirehead because it’s one-action, and the second AI doesn’t wirehead because it was designed by the first AI to affect the world rather than wirehead.
What makes it choose the action that fills universe with paperclips over the action that makes the goal be achieved by modification to the map? edit: or do you have some really specialized narrow AI that knows nothing whatsoever of itself in the world, and simply solves the paperclip maximization in sandbox inside itself (sandbox where the goal is not existing), then simple mechanisms make this action happen in the world?
edit: to clarify. What you don’t understand is that wireheading is a valid solution to the goal. The agent is not wireheading because it makes it happy, it’s wireheading because wireheading really is the best solution to the goal you have given to it. You need to jump through hoops to make the wireheading not be a valid solution from the agent’s perspective. You not liking it as solution does not suffice. You thinking that it is fake solution does not suffice. The agent has to discard that solution.
edit: to clarify even further. When evaluating possible solutions, agent comes up with an action that makes a boolean function within itself return true. That can happen if the function, abstractly defined, in fact return true, that can happen if an action modifies the boolean function and changes it to return true , that can happen if the action modifies inputs to this boolean function to make it return true.
edit: or do you have some really specialized narrow AI that knows nothing whatsoever of itself in the world, and simply solves the paperclip maximization in sandbox inside itself (sandbox where the goal is not existing), then simple mechanisms make this action happen in the world?
Yes. Though the sandbox is more like a quined formal description of the world with a copy of the AI in it. The AI can’t simulate the whole sandbox, but the AI can prove theorems about the sandbox, which is enough to pick a good action.
So, it proves a theorem that if it creates a glider in such and such spot, so and so directed, then [the goal definition as given inside the AI] becomes true. Then it creates that glider in the real world, the glider glides, and hits straight into the definition as given inside the AI making it true. Why is this invalid solution? I know it’s not what you want it to do—you want it to come up with some mega self replicating glider factory that will fill the universe with paperclips. But it ain’t obligated to do what you want.
The AI reasons with its map, the map of the world. The map depicts events that happen in the world outside of AI, and it also depicts the events that happen to the AI, or to AI’s map of the world. In AI’s map, an event in the world and AI map’s picture of that event are different elements, just as they are different elements of the world itself. The goal that guides AI’s choice of action can then distinguish between an event in the world and AI map’s representation of that event, because these two events are separately depicted in its map.
Can it however distinguish between two different events in the world that result in same map state?
edit: here, example for you. For you, some person you care about, has same place in map even though the atoms get replaced etc. If that person gets ill, you may want to mind upload that person, into an indistinguishable robot body, right? You’ll probably argue that it is a valid solution to escaping death. A lot of people have different map, and they will argue that you’re just making a substitute for your own sake, as the person will be dead, gone forever. Some other people got really bizarre map where they are mapping ‘souls’ and have the person alive in the ‘heaven’, which is on the map. Bottom line is, everyone’s just trying to resolve the problem in the map. In the territory, everyone is gone every second.
edit: and yes, you can make a map which will distinguish between sending a glider that hits the computer, and making a ton of paperclips. You still have a zillion world states, including those not filled with paperclips, mapping to the same point in map as the world filled with paperclips. Your best bet is just making the AI narrow enough that it can only find the solutions where the world is filled with paperclips.
I don’t know, the above reads to me as “Everything is confusing. Anyway, my bottom line is .” I don’t know how to parse this as an argument, how to use it to make any inferences about .
The purpose of the grandparent was to show that it’s not in principle problematic to distinguish between a goal state and that goal state’s image in the map, so there is no reason for wireheading to be consequentialistically appealing, so long as an agent is implemented carefully enough.
Because the AI’s goal doesn’t refer to a spot inside the computer running the AI. The AI just does formal math. You can think of the AI as a program that stops when it finds an integer N obeying a certain equation. Such a program won’t stop upon finding an integer N such that “returning N causes the creation of a glider that crashes into the computer and changes the representation of the equation so that N becomes a valid solution” or whatever. That N is not a valid solution to the original equation, so the program skips it and looks at the next one. Simple as that.
First, you defined the equation so that it included the computer and itself (that simulator it uses to think, and also self improve as needed).
Now you are changing the definitions so that the equation is something else. There’s a good post by Eliezer about being specific , which you are not. Go define the equation first.
Also, it is not a question about narrow AI. I can right now write an ‘AI’ that would try to find self replicating glider gun that tiles entire game of life with something. And yes, that AI may run inside the machine in game of life. The issue is, that’s more like ‘evil terrorists using protein folder simulator AI connected to automated genome lab to make plague’, than ‘the AI maximizes paperclips’.
You handwave too much, and the people who already accept premise, they like the handwave that sounds vaguely theoretic. Those who do not, aren’t too impressed, and are only annoyed.
You handwave too much, and the people who already accept premise
Or the people who understand the mathematics.
Cousin_it’s mathematics is correct, if counter-intuitive to those not used to thinking about quines. Whether it implies what he thinks it implies is a separate question as I discuss here.
Well, I assumed that he was building an AGI, and even agreed that it is entirely possible to rig the AI so that something the AI does inside a sim, gets replicated in the outside world. I even gave example: you make narrow AI that generates a virus mostly by simulated molecular interactions (and has some sim of the human immune system, people’s response to the world events, what WHO might do, and such) and wire it up to a virus making lab that can vent it’s produce into the air in the building or something edit: or best yet one that can mail samples to what ever addresses. That would be the AI that kills everyone. Including the AI itself in it’s sim would serve little functional role, and this AI won’t wirehead. It’s clear that the AGI risk is not about this.
edit: and to clarify, the problem with vague handwaving is that without defining what you handwave around, it is easy to produce stuff that is irrelevant, but appears relevant and math-y.
edit: hmm, seems that post with the virus making AI example didn’t get posted. Still, http://lesswrong.com/lw/bfj/evidence_for_the_orthogonality_thesis/68cf and http://lesswrong.com/lw/bfj/evidence_for_the_orthogonality_thesis/68eo convey the point. I’ve never said it is literally impossible to make a narrow AI that is rigged to tile the game world with blocks. It is, clearly, possible. One could make a glider gun iterator that finds the self replicating glider gun in the simulator, then some simple mechanisms set to make that gun in the real world. That is not a case of AI wanting to do something to the real world. That’s a glorified case of ‘my thermostat doesn’t wirehead’, to borrow from Will_Newsome.
Other issue is that one could immediately define some specific goal like ‘number of live cells’, and we could discuss this more specifically, instead of vague handwave about ill defined goal. But I can’t just define things narrowly for the other side of an argument. The wireheading is a problem of systems that can improve themselves. A system that can e.g. decide that it can’t figure out how to maximize live cells but it can prove some good theorems about four blocks.
If the AI’s map represents the territory accurately enough, the AI can use the map to check the consequences of returning different actions, then pick one action and return it, ipso facto affecting the territory. I think I already know how to build a working paperclipper in a Game of Life universe, and it doesn’t seem to wirehead itself. Do you have a strong argument why all non-magical real-world AIs will wirehead themselves before they get a chance to hurt humans?
Eurisko is an important datum.
This isn’t quite an AGI. In particular, it doesn’t even take input from its surroundings.
Fair enough. We can handwave a little and say that AI2 built by AI1 might be able to sense things and self-modify, but this offloading of the whole problem to AI1 is not really satisfying. We’d like to understand exactly how AIs should sense and self-modify, and right now we don’t.
Let it build a machine that takes input from own surroundings.
But the new machine can’t self-modify. My point is about the limitations of cousin_it’s example. The machine has a completely accurate model of the world as input and uses an extremely inefficient algorithm to find a way to paperclip the world.
The second machine can be designed to build a third machine, based on the second machine’s observations.
Yes, but now the argument that you will converge to a paper clipper is much weaker.
Perhaps it’s also worth bringing up the example of controllers, which don’t wirehead (or do they, once sufficiently complex?) and do optimize the real world. (Thermostats confuse me. Do they have intentionality despite lacking explicit representations? (FWIW Searle told me the answer was no because of something about consciousness, but I’m not sure how seriously he considered my question.))
You are looking for intentionality in the wrong place. Why do thermostats exist? Follow the improbability.
Yes, actual thermostats got their shard of the Void from humans, just as humans got their shard of the Void from evolution. (I’d say “God” and not “the Void”, but whatever.) But does evolution have intentionality? The point is to determine whether or not intentionality is fundamentally different from seemingly-simpler kinds of optimization—and if it’s not, then why does symbol grounding seem like such a difficult problem? …Or something, my brain is too stressed to actually think.
Taboo “intentionality”.
Yes, discerning the hidden properties of “intentionality” is the goal which motivates looking at the edge case of thermostats.
I don’t see why it doesn’t seem to wirehead itself, unless for some reason the game of life manipulators are too clumsy to send a glider to achieve the goal by altering the value within the paperclipper (e.g. within it’s map). Ultimately the issue is that the goal is achieved when some cells within paperclipper which define the goal acquire certain values. You need to have rather specific action generator so that it avoids generating the action that changes the cells within paperclipper. Can you explain why this solution would not be arrived at? Can your paperclipper then self improve if it can’t self modify?
I do imagine that very laboriously you can manage to define some sort of paperclipping goal (maximize number of live cells?), on the AI into which you, by hand, hard coded complete understanding of game of life, and you might be able to make it not recognize sending of the glider into the goal system and changing it as ‘goal accomplished’. The issue is not whenever it’s possible (I can make a battery of self replicating glider guns and proclaim them to be an AI), the issue is whenever it is at all likely to happen without immense lot of work implementing much of the stuff that the AI ought to learn, into the AI, by hand. Ultimately with no role for AI’s intelligence as intelligence amplifier, but only as obstacle that gets in your way.
Furthermore, keep in mind that the AI’s model of game of life universe is incomplete. The map does not represent territory accurately enough, and can not, as the AI occupies only a small fraction of the universe, and encodes the universe into itself very inefficiently.
The paperclipper’s goal is not to modify the map in a specific way, but to fill the return value register with a value that obeys specific constraints. (Or to zoom in even further, the paperclipper doesn’t even have a fundamental “goal”. The paperclipper just enumerates different values until it finds one that fits the constraints. When a value is found, it gets written to the register, and the program halts. That’s all the program does.) After that value ends up in the register, it causes ripples in the world, because the register is physically connected to actuators or something, which were also described in the paperclipper’s map. If the value indeed obeys the constraints, the ripples in the world will lead to creating many paperclips.
Not sure what sending gliders has to do with the topic. We’re talking about the paperclipper wireheading itself, not the game manipulators trying to wirehead the paperclipper.
Incompleteness of the model, self-modification and other issues seem to be red herrings. If we have a simple model where wireheading doesn’t happen, why should we believe that wireheading will necessarily happen in more complex models? I think a more formal argument is needed here.
You don’t have simple model where wireheading doesn’t happen, you have the model where you didn’t see how the wireheading would happen by the paperclipper, erhm, touching itself (i.e. it’s own map) with it’s manipulators, satisfying the condition without filling universe with paperclips.
edit: that is to say, the agent which doesn’t internally screw up it’s model, can still e.g. dissolve the coat off a ram chip and attach a wire there, or failing that, produce the fake input for it’s own senses (which we do a whole lot).
Maybe you misunderstood the post. The paperclipper in the post first spends some time thinking without outputting any actions, then it outputs one single action and halts, after which any changes to the map are irrelevant.
We don’t have many models of AIs that output multiple successive actions, but one possible model is to have a one-action AI whose action is to construct a successor AI. In this case the first AI doesn’t wirehead because it’s one-action, and the second AI doesn’t wirehead because it was designed by the first AI to affect the world rather than wirehead.
What makes it choose the action that fills universe with paperclips over the action that makes the goal be achieved by modification to the map? edit: or do you have some really specialized narrow AI that knows nothing whatsoever of itself in the world, and simply solves the paperclip maximization in sandbox inside itself (sandbox where the goal is not existing), then simple mechanisms make this action happen in the world?
edit: to clarify. What you don’t understand is that wireheading is a valid solution to the goal. The agent is not wireheading because it makes it happy, it’s wireheading because wireheading really is the best solution to the goal you have given to it. You need to jump through hoops to make the wireheading not be a valid solution from the agent’s perspective. You not liking it as solution does not suffice. You thinking that it is fake solution does not suffice. The agent has to discard that solution.
edit: to clarify even further. When evaluating possible solutions, agent comes up with an action that makes a boolean function within itself return true. That can happen if the function, abstractly defined, in fact return true, that can happen if an action modifies the boolean function and changes it to return true , that can happen if the action modifies inputs to this boolean function to make it return true.
Yes. Though the sandbox is more like a quined formal description of the world with a copy of the AI in it. The AI can’t simulate the whole sandbox, but the AI can prove theorems about the sandbox, which is enough to pick a good action.
So, it proves a theorem that if it creates a glider in such and such spot, so and so directed, then [the goal definition as given inside the AI] becomes true. Then it creates that glider in the real world, the glider glides, and hits straight into the definition as given inside the AI making it true. Why is this invalid solution? I know it’s not what you want it to do—you want it to come up with some mega self replicating glider factory that will fill the universe with paperclips. But it ain’t obligated to do what you want.
The AI reasons with its map, the map of the world. The map depicts events that happen in the world outside of AI, and it also depicts the events that happen to the AI, or to AI’s map of the world. In AI’s map, an event in the world and AI map’s picture of that event are different elements, just as they are different elements of the world itself. The goal that guides AI’s choice of action can then distinguish between an event in the world and AI map’s representation of that event, because these two events are separately depicted in its map.
Can it however distinguish between two different events in the world that result in same map state?
edit: here, example for you. For you, some person you care about, has same place in map even though the atoms get replaced etc. If that person gets ill, you may want to mind upload that person, into an indistinguishable robot body, right? You’ll probably argue that it is a valid solution to escaping death. A lot of people have different map, and they will argue that you’re just making a substitute for your own sake, as the person will be dead, gone forever. Some other people got really bizarre map where they are mapping ‘souls’ and have the person alive in the ‘heaven’, which is on the map. Bottom line is, everyone’s just trying to resolve the problem in the map. In the territory, everyone is gone every second.
edit: and yes, you can make a map which will distinguish between sending a glider that hits the computer, and making a ton of paperclips. You still have a zillion world states, including those not filled with paperclips, mapping to the same point in map as the world filled with paperclips. Your best bet is just making the AI narrow enough that it can only find the solutions where the world is filled with paperclips.
I don’t know, the above reads to me as “Everything is confusing. Anyway, my bottom line is .” I don’t know how to parse this as an argument, how to use it to make any inferences about .
The purpose of the grandparent was to show that it’s not in principle problematic to distinguish between a goal state and that goal state’s image in the map, so there is no reason for wireheading to be consequentialistically appealing, so long as an agent is implemented carefully enough.
Because the AI’s goal doesn’t refer to a spot inside the computer running the AI. The AI just does formal math. You can think of the AI as a program that stops when it finds an integer N obeying a certain equation. Such a program won’t stop upon finding an integer N such that “returning N causes the creation of a glider that crashes into the computer and changes the representation of the equation so that N becomes a valid solution” or whatever. That N is not a valid solution to the original equation, so the program skips it and looks at the next one. Simple as that.
First, you defined the equation so that it included the computer and itself (that simulator it uses to think, and also self improve as needed).
Now you are changing the definitions so that the equation is something else. There’s a good post by Eliezer about being specific , which you are not. Go define the equation first.
Also, it is not a question about narrow AI. I can right now write an ‘AI’ that would try to find self replicating glider gun that tiles entire game of life with something. And yes, that AI may run inside the machine in game of life. The issue is, that’s more like ‘evil terrorists using protein folder simulator AI connected to automated genome lab to make plague’, than ‘the AI maximizes paperclips’.
I’m bowing out of this discussion because it doesn’t seem to improve anyone’s understanding.
You handwave too much, and the people who already accept premise, they like the handwave that sounds vaguely theoretic. Those who do not, aren’t too impressed, and are only annoyed.
Or the people who understand the mathematics.
Cousin_it’s mathematics is correct, if counter-intuitive to those not used to thinking about quines. Whether it implies what he thinks it implies is a separate question as I discuss here.
Well, I assumed that he was building an AGI, and even agreed that it is entirely possible to rig the AI so that something the AI does inside a sim, gets replicated in the outside world. I even gave example: you make narrow AI that generates a virus mostly by simulated molecular interactions (and has some sim of the human immune system, people’s response to the world events, what WHO might do, and such) and wire it up to a virus making lab that can vent it’s produce into the air in the building or something edit: or best yet one that can mail samples to what ever addresses. That would be the AI that kills everyone. Including the AI itself in it’s sim would serve little functional role, and this AI won’t wirehead. It’s clear that the AGI risk is not about this.
edit: and to clarify, the problem with vague handwaving is that without defining what you handwave around, it is easy to produce stuff that is irrelevant, but appears relevant and math-y.
edit: hmm, seems that post with the virus making AI example didn’t get posted. Still, http://lesswrong.com/lw/bfj/evidence_for_the_orthogonality_thesis/68cf and http://lesswrong.com/lw/bfj/evidence_for_the_orthogonality_thesis/68eo convey the point. I’ve never said it is literally impossible to make a narrow AI that is rigged to tile the game world with blocks. It is, clearly, possible. One could make a glider gun iterator that finds the self replicating glider gun in the simulator, then some simple mechanisms set to make that gun in the real world. That is not a case of AI wanting to do something to the real world. That’s a glorified case of ‘my thermostat doesn’t wirehead’, to borrow from Will_Newsome.
Other issue is that one could immediately define some specific goal like ‘number of live cells’, and we could discuss this more specifically, instead of vague handwave about ill defined goal. But I can’t just define things narrowly for the other side of an argument. The wireheading is a problem of systems that can improve themselves. A system that can e.g. decide that it can’t figure out how to maximize live cells but it can prove some good theorems about four blocks.