You’re right. Feel free to formalize my argument at your leisure and tell me where it breaks down.
EDIT: All AIXI cares about is the input. And so the proof that rewiring your head can increase reward is simply that r(x) has at least one maximum (since its sum over steps needs to have a maximum), combined with the assumption that the real world does not already maximize the sum of r(x). As for the asteroid, the stuff doing the inputting gets blown up, so the simplest implementation just has the reward be r(null). But you could have come up with that on your own.
I don’t think we need to prove wireheading here. Suffices that it only cares about the input, and so will find a way to set that input. You wire it to paperclip counter to maximize paperclips, it’ll be also searching for a way to replace counter with infinity or ‘trick’ the counter (anything goes). You sit here yourself rewarding it for making paperclips, with a pushbutton, it’s search will include tricking you into pushing the button.
I also think that if you want it to self preserve you’ll need to code in special stuff to equate self inside world model (which is not a full model of itself otherwise infinite recursion) with self in the real world. Actually on the recent comment by Eliezer maybe we agree on this:
ahh by the way: it has to be embedded in the real world, which doesn’t seem to allow for infinite computing power, so, no full perfect simulation of real world inside AIXI (or ad infinitum recursion) is allowed.
edit: and by AIXI i meant one of the computable approximations (e.g. AIXI-tl).
The argument breaks down because you are equivocating on what the space is to search over and what the utility function in question is.
Under a given utility function U, “change the utility function to U’ ” won’t generally have positive utility. Self-awareness and pleasure-seeking aren’t some natural properties of optimization processes. They have to be explicitly built in.
Suppose you set a theorem-prover to work looking for a proof of some theorem. It’s searching over the space of proofs. There’s no entry corresponding to “pick a different and easier theorem to prove”, or “stop proving theorems and instead be happy.”
Yes, I just changed the notation to be more standard. The point remains. There need not be any “x” that corresponds to “pick a new r” or to “pretend x was really x’”. If there was such an x, it wouldn’t in general have high utility.
x is just an input string. So, for example, each x could be a frame coming from a video camera. AIXI then has a reward function r(x), and it maximizes the sum of r(x) over some large number of time steps. In our example, let’s say that if the camera is looking at a happy puppy, r is big, if it’s looking at something else, r is small.
In the lab, AIXI might have to choose between two options (action can be handled by some separate output string, as in Hutter’s paper): 1) Don’t follow the puppy around. 2) Follow the puppy around.
Clearly, it will do 2, because r is bigger when it’s looking at a happy puppy, and 2 increases the chance of doing so. One might even say one has a puppy-following robot.
In the real world, there are more options—if you give AIXI access to a printer and some scotch tape, options look like this: 1) Don’t follow the puppy around. 2) Follow the puppy around. 3) Print out a picture of a happy puppy and tape it to the camera.
Clearly, it will do 3, because r is bigger when it’s looking at a happy puppy, and 3 increases the chance of doing so. One might even say one has a happy-puppy-looking-at maximizing robot. This time it’s even true.
I’m not aware of any formalization of AIXI that reflects its real world form. Your comment thus amounts to something like a plausibility argument, but trying to formalize it further seems tricky and possibly highly nontrivial.
While obviously there are caveats, they are limited. AIXI rewires its inputs if (a) it’s possible, and (b) it increases r(x). It’s not super-complicated.
Maybe I’m missing something about the translation from implementation to the language used in the paper. But nobody is saying “you’re missing something.” It’s more like you’re saying “surely it must be complicated!” Well, no.
AIXI is a noncomputable thing that always picks the option that maximizes the total expected reward r(x(k)). So everything I’ve been saying has been about functions, not about turing machines. If rewiring your inputs is possible, and it increases r(x), then AIXI will prefer to do it. Not hard.
Yep. Seems to apply to the limited time versions as well. At least they don’t specify any difference between “doing innovative stuff that you want them to do for sake of the AI risk argument” and “sitting in a corner masturbating quietly”, and the latter looks like way simpler solution to the problem they are really given (in math) [but not of our human-language loose and fuzzy description of that problem]
What I think is the case, is that this whole will to really live and really do stuff is very hard to implement, and implementing it doesn’t really add anything to the engineering powers of the AI so even when it’s implemented, it’ll not result in something that’s out engineering everyone. I’d become concerned if we had engineering tools that are very powerful but are wireheading (or masturbating) left and right to the point that we can’t get much use out of them. Then i’d be properly freaked out that if someone fixes this problem somehow, something undesired might happen and it would be impossible to deal with it.
You’re right. Feel free to formalize my argument at your leisure and tell me where it breaks down.
EDIT: All AIXI cares about is the input. And so the proof that rewiring your head can increase reward is simply that r(x) has at least one maximum (since its sum over steps needs to have a maximum), combined with the assumption that the real world does not already maximize the sum of r(x). As for the asteroid, the stuff doing the inputting gets blown up, so the simplest implementation just has the reward be r(null). But you could have come up with that on your own.
I don’t think we need to prove wireheading here. Suffices that it only cares about the input, and so will find a way to set that input. You wire it to paperclip counter to maximize paperclips, it’ll be also searching for a way to replace counter with infinity or ‘trick’ the counter (anything goes). You sit here yourself rewarding it for making paperclips, with a pushbutton, it’s search will include tricking you into pushing the button.
I also think that if you want it to self preserve you’ll need to code in special stuff to equate self inside world model (which is not a full model of itself otherwise infinite recursion) with self in the real world. Actually on the recent comment by Eliezer maybe we agree on this:
http://lesswrong.com/lw/3kz/new_years_predictions_thread_2011/3a20
ahh by the way: it has to be embedded in the real world, which doesn’t seem to allow for infinite computing power, so, no full perfect simulation of real world inside AIXI (or ad infinitum recursion) is allowed.
edit: and by AIXI i meant one of the computable approximations (e.g. AIXI-tl).
The argument breaks down because you are equivocating on what the space is to search over and what the utility function in question is.
Under a given utility function U, “change the utility function to U’ ” won’t generally have positive utility. Self-awareness and pleasure-seeking aren’t some natural properties of optimization processes. They have to be explicitly built in.
Suppose you set a theorem-prover to work looking for a proof of some theorem. It’s searching over the space of proofs. There’s no entry corresponding to “pick a different and easier theorem to prove”, or “stop proving theorems and instead be happy.”
The utility function is r(x) (the “r” is for “reward function”). I’m talking about changing x, and leaving r unchanged.
Yes, I just changed the notation to be more standard. The point remains. There need not be any “x” that corresponds to “pick a new r” or to “pretend x was really x’”. If there was such an x, it wouldn’t in general have high utility.
x is just an input string. So, for example, each x could be a frame coming from a video camera. AIXI then has a reward function r(x), and it maximizes the sum of r(x) over some large number of time steps. In our example, let’s say that if the camera is looking at a happy puppy, r is big, if it’s looking at something else, r is small.
In the lab, AIXI might have to choose between two options (action can be handled by some separate output string, as in Hutter’s paper):
1) Don’t follow the puppy around.
2) Follow the puppy around.
Clearly, it will do 2, because r is bigger when it’s looking at a happy puppy, and 2 increases the chance of doing so. One might even say one has a puppy-following robot.
In the real world, there are more options—if you give AIXI access to a printer and some scotch tape, options look like this:
1) Don’t follow the puppy around.
2) Follow the puppy around.
3) Print out a picture of a happy puppy and tape it to the camera.
Clearly, it will do 3, because r is bigger when it’s looking at a happy puppy, and 3 increases the chance of doing so. One might even say one has a happy-puppy-looking-at maximizing robot. This time it’s even true.
I’m not aware of any formalization of AIXI that reflects its real world form. Your comment thus amounts to something like a plausibility argument, but trying to formalize it further seems tricky and possibly highly nontrivial.
While obviously there are caveats, they are limited. AIXI rewires its inputs if (a) it’s possible, and (b) it increases r(x). It’s not super-complicated.
Maybe I’m missing something about the translation from implementation to the language used in the paper. But nobody is saying “you’re missing something.” It’s more like you’re saying “surely it must be complicated!” Well, no.
Can you actually formalize what that means in terms of Turing machines? It isn’t obvious to me how to do so.
AIXI is a noncomputable thing that always picks the option that maximizes the total expected reward r(x(k)). So everything I’ve been saying has been about functions, not about turing machines. If rewiring your inputs is possible, and it increases r(x), then AIXI will prefer to do it. Not hard.
Yep. Seems to apply to the limited time versions as well. At least they don’t specify any difference between “doing innovative stuff that you want them to do for sake of the AI risk argument” and “sitting in a corner masturbating quietly”, and the latter looks like way simpler solution to the problem they are really given (in math) [but not of our human-language loose and fuzzy description of that problem]
What I think is the case, is that this whole will to really live and really do stuff is very hard to implement, and implementing it doesn’t really add anything to the engineering powers of the AI so even when it’s implemented, it’ll not result in something that’s out engineering everyone. I’d become concerned if we had engineering tools that are very powerful but are wireheading (or masturbating) left and right to the point that we can’t get much use out of them. Then i’d be properly freaked out that if someone fixes this problem somehow, something undesired might happen and it would be impossible to deal with it.