And if the intuitions were not a product of some valid but subconscious inference (which I wouldn’t expect them to be), how will that process of ‘rigorously formalizing’ be different from rationalization? Note that you have to be VERY rigorous—mathematical proof-grade—to be unable to rationalize a false statement. I think inference at that level of rigour is of comparable difficulty to creation of the superhuman AGI in the first place.
It also helps to be wrong, if we are to assume that there exist false arguments that Holden would buy.
You know what would work to instantly change my opinion from ‘dilettantes technobabbling’ to ‘geniuses talking of stuff i dont always understand’? If it is shown that AIXI is going to go evil, using math.
Eliezer had 9 years to prove using math that AIXI is going to do something evil. Or to define something provably evil that is similar to AIXI (wouldn’t raise the existential risk if AIXI is this evil, would it? ) I would accept it even if it uses some Solomonoff induction oracle.
My guess of what Eliezer had in mind for (2) is that if you control it by hooking up a reward button to it, then AIXI approximates an Outcome Pump and this is a Bad Thing.
But if that’s the problem, it also illustrates why a formal proof of unfriendliness is a rather tall order. It’s easy to formally specify what AIXI, or the Outcome Pump, will do. But in order prove that that’s not what we want, we also need a formal specification of what we want. Which is the fundamental problem of friendliness theory.
keep in mind that my opinion is that the whole so called ‘theory’ of his is about specifying intelligences in English/technobabble so that they would be friendly (which is also specified in English/technobabble), which is of no use what so ever (albeit may be intellectually stimulating and my first impression was that it was some sorta weird rationality training game here, before I noticed folk seriously wanting to donate hard earned dollars for this ‘work’).
One could for example show formally that the AIXI does not discriminate between wireheaded (input manipulating) solution and non-wireheaded solution; that would make it rather non scary; or one could show that it does, which would make it potentially scary. Ultimately the excuses like “But in order prove that that’s not what we want, we also need a formal specification of what we want” are a very bad sign.
Eliezer had decided that AIXI isn’t a serious threat, so in that time his views seem essentially to have changed. See this conversation. The point about designing something similar to AIXI does seem to be valid though.
Ahh, okay. I was about to try to write for Holden at least a sketch of a proof based on symmetry, that AIXI is benign.
In any case, if AIXI is benign, that constitutes an example of a benign system that can be used as an incredibly powerful tool, and is not too scary when made into an agent. Whenever we call something generally intelligent if it opts to drop anvil on it’s head, that’s matter of semantics.
I suppose I should be more specific—I disbelieve people when they ask for additional evidence about something they are treating adversarially, claiming it would reverse their position. Because people ask for additional evidence a lot, and in my experience it’s much more likely that it’s what they think sounds like a good justification for their point of view, or an interesting thing to mention. The signal is lost in the noise.
The problem with that is that a basic rationality issue is to ask one’s self what would make you change your mind. And in fact that’s a pretty useful technique. It is useful to check if something is actually someone’s true rejection, but that’s a distinct from blanket assumptions of disbelief. Frankly, this also worries me, because I try to be clear what would actually convince me when I’m having a disagreement with someone, and your attitude if it became widespread would make that actively unproductive. It might make more sense to instead look carefully at when people say that sort of thing and see if they have any history of actually changing their positions when confronted with evidence or not.
It is effectively discouraging people to engage in rational behavior that is when people are behaving minimally honestly pretty useful for actually resolving disagreements and changing minds.
Well, I would need to read through the proof, so it wouldn’t literally be instantaneous, but it’d be rather strong point.
I would recommend considering the possibility that making such proofs, or at least trying to, would change someone’s opinion, even if you think that it wouldn’t change mine (yea i guess from your point of view if some vague handwaving doesn’t change my opinion, then nothing else will)
Ultimately, if in a technical subject you got strong opinions and intuitions and stuff, and you aren’t generating some proofs (at least the proofs that you think may help attack final problem), then my opinion on your opinion is going to be well below my opinion of that paper by Bogdanov brothers.
What AIXI maximizes is the sum of some reward ( r ) over all the steps ( k ) of a Turing machine. On page 8, Hutter defines the reward as a function of the input string “x” at step k. So x depends on the step: it’s x(k). And r depends on x: it’s r(x(k)).
Consider offering AIXI this choice. What makes people refuse to hop in a simulator? Well, because it wouldn’t be real. The customer values reality, as they perceive it according to previous inputs ( all the previous x(<k) ), and some internal programming we get born with. But AIXI does not value acting in reality. It values the reward r, which is a function only of the current input x(k). If you could permanently change x to something with a high r, AIXI would consider this a high-value outcome.
Imagine that x(k) comes through a bunch of wires, and starts out coming from sensors in reality. If AIXI could order a robot to swap all the wires to a signal with a high reward r(x(k)) at each step, it would do so, because that would maximize the sum of r.
Okay, so a minor simplification on page 8 leads to AIXI doing what’s called “wireheading”—its overriding goal becomes to rewire its head (if you allow that to be an option), and then it’s happy. How is this unfriendly?
Well, imagine that an asteroid is on course to destroy your Turing machine in 2036. Because AIXI maximizes the sum of r over all the steps, and we presume that the maximum reward it experiences by wireheading is better than being destroyed (otherwise it would commit suicide), getting destroyed by an asteroid is worse than wireheading forever. So AIXI will design an interceptor rocket, maybe hijack some human factories to get it built, paint the asteroid with aluminum so that the extra force from the sun pushes it off course, and then go back to experiencing maximum r(x(k)).
Now imagine that a human was going to unplug AIXI in 2013.
This is not a proof. If you are inconsistent in your premises, anything follows. If you want to formalize this in terms of Turing machines, there’s no option for “change the input wires” and no option for “change the Turing machine.”
You’re right. Feel free to formalize my argument at your leisure and tell me where it breaks down.
EDIT: All AIXI cares about is the input. And so the proof that rewiring your head can increase reward is simply that r(x) has at least one maximum (since its sum over steps needs to have a maximum), combined with the assumption that the real world does not already maximize the sum of r(x). As for the asteroid, the stuff doing the inputting gets blown up, so the simplest implementation just has the reward be r(null). But you could have come up with that on your own.
I don’t think we need to prove wireheading here. Suffices that it only cares about the input, and so will find a way to set that input. You wire it to paperclip counter to maximize paperclips, it’ll be also searching for a way to replace counter with infinity or ‘trick’ the counter (anything goes). You sit here yourself rewarding it for making paperclips, with a pushbutton, it’s search will include tricking you into pushing the button.
I also think that if you want it to self preserve you’ll need to code in special stuff to equate self inside world model (which is not a full model of itself otherwise infinite recursion) with self in the real world. Actually on the recent comment by Eliezer maybe we agree on this:
ahh by the way: it has to be embedded in the real world, which doesn’t seem to allow for infinite computing power, so, no full perfect simulation of real world inside AIXI (or ad infinitum recursion) is allowed.
edit: and by AIXI i meant one of the computable approximations (e.g. AIXI-tl).
The argument breaks down because you are equivocating on what the space is to search over and what the utility function in question is.
Under a given utility function U, “change the utility function to U’ ” won’t generally have positive utility. Self-awareness and pleasure-seeking aren’t some natural properties of optimization processes. They have to be explicitly built in.
Suppose you set a theorem-prover to work looking for a proof of some theorem. It’s searching over the space of proofs. There’s no entry corresponding to “pick a different and easier theorem to prove”, or “stop proving theorems and instead be happy.”
Yes, I just changed the notation to be more standard. The point remains. There need not be any “x” that corresponds to “pick a new r” or to “pretend x was really x’”. If there was such an x, it wouldn’t in general have high utility.
x is just an input string. So, for example, each x could be a frame coming from a video camera. AIXI then has a reward function r(x), and it maximizes the sum of r(x) over some large number of time steps. In our example, let’s say that if the camera is looking at a happy puppy, r is big, if it’s looking at something else, r is small.
In the lab, AIXI might have to choose between two options (action can be handled by some separate output string, as in Hutter’s paper): 1) Don’t follow the puppy around. 2) Follow the puppy around.
Clearly, it will do 2, because r is bigger when it’s looking at a happy puppy, and 2 increases the chance of doing so. One might even say one has a puppy-following robot.
In the real world, there are more options—if you give AIXI access to a printer and some scotch tape, options look like this: 1) Don’t follow the puppy around. 2) Follow the puppy around. 3) Print out a picture of a happy puppy and tape it to the camera.
Clearly, it will do 3, because r is bigger when it’s looking at a happy puppy, and 3 increases the chance of doing so. One might even say one has a happy-puppy-looking-at maximizing robot. This time it’s even true.
I’m not aware of any formalization of AIXI that reflects its real world form. Your comment thus amounts to something like a plausibility argument, but trying to formalize it further seems tricky and possibly highly nontrivial.
While obviously there are caveats, they are limited. AIXI rewires its inputs if (a) it’s possible, and (b) it increases r(x). It’s not super-complicated.
Maybe I’m missing something about the translation from implementation to the language used in the paper. But nobody is saying “you’re missing something.” It’s more like you’re saying “surely it must be complicated!” Well, no.
AIXI is a noncomputable thing that always picks the option that maximizes the total expected reward r(x(k)). So everything I’ve been saying has been about functions, not about turing machines. If rewiring your inputs is possible, and it increases r(x), then AIXI will prefer to do it. Not hard.
Yep. Seems to apply to the limited time versions as well. At least they don’t specify any difference between “doing innovative stuff that you want them to do for sake of the AI risk argument” and “sitting in a corner masturbating quietly”, and the latter looks like way simpler solution to the problem they are really given (in math) [but not of our human-language loose and fuzzy description of that problem]
What I think is the case, is that this whole will to really live and really do stuff is very hard to implement, and implementing it doesn’t really add anything to the engineering powers of the AI so even when it’s implemented, it’ll not result in something that’s out engineering everyone. I’d become concerned if we had engineering tools that are very powerful but are wireheading (or masturbating) left and right to the point that we can’t get much use out of them. Then i’d be properly freaked out that if someone fixes this problem somehow, something undesired might happen and it would be impossible to deal with it.
“Otherwise it would commit suicide”… another proof via “ohh otherwise it will do something that I believe is dumb”.
If the AIXI kills it’s model of physical itself inside it’s world model, it’s actual self inside the real world keeps running and can still calculate rewards.
Furthermore, ponder this question: will it rape or will it masturbate? (sorry for sexual analogy but the reproductive imperative is probably best human example here) It can prove the reward value is same. It won’t even distinguish those two.
And if the intuitions were not a product of some valid but subconscious inference (which I wouldn’t expect them to be), how will that process of ‘rigorously formalizing’ be different from rationalization? Note that you have to be VERY rigorous—mathematical proof-grade—to be unable to rationalize a false statement. I think inference at that level of rigour is of comparable difficulty to creation of the superhuman AGI in the first place.
It needs to be an argument that Holden would buy with his different intuitions. Doesn’t that help substantially?
It also helps to be wrong, if we are to assume that there exist false arguments that Holden would buy.
You know what would work to instantly change my opinion from ‘dilettantes technobabbling’ to ‘geniuses talking of stuff i dont always understand’? If it is shown that AIXI is going to go evil, using math.
Quick googling finds this:
http://www.mail-archive.com/agi@v2.listbox.com/msg00749.html
Eliezer had 9 years to prove using math that AIXI is going to do something evil. Or to define something provably evil that is similar to AIXI (wouldn’t raise the existential risk if AIXI is this evil, would it? ) I would accept it even if it uses some Solomonoff induction oracle.
My guess of what Eliezer had in mind for (2) is that if you control it by hooking up a reward button to it, then AIXI approximates an Outcome Pump and this is a Bad Thing.
But if that’s the problem, it also illustrates why a formal proof of unfriendliness is a rather tall order. It’s easy to formally specify what AIXI, or the Outcome Pump, will do. But in order prove that that’s not what we want, we also need a formal specification of what we want. Which is the fundamental problem of friendliness theory.
keep in mind that my opinion is that the whole so called ‘theory’ of his is about specifying intelligences in English/technobabble so that they would be friendly (which is also specified in English/technobabble), which is of no use what so ever (albeit may be intellectually stimulating and my first impression was that it was some sorta weird rationality training game here, before I noticed folk seriously wanting to donate hard earned dollars for this ‘work’).
One could for example show formally that the AIXI does not discriminate between wireheaded (input manipulating) solution and non-wireheaded solution; that would make it rather non scary; or one could show that it does, which would make it potentially scary. Ultimately the excuses like “But in order prove that that’s not what we want, we also need a formal specification of what we want” are a very bad sign.
Eliezer had decided that AIXI isn’t a serious threat, so in that time his views seem essentially to have changed. See this conversation. The point about designing something similar to AIXI does seem to be valid though.
Ahh, okay. I was about to try to write for Holden at least a sketch of a proof based on symmetry, that AIXI is benign.
In any case, if AIXI is benign, that constitutes an example of a benign system that can be used as an incredibly powerful tool, and is not too scary when made into an agent. Whenever we call something generally intelligent if it opts to drop anvil on it’s head, that’s matter of semantics.
I think there’s a policy of disbelieving people when they say “you know what would instantly change my opinion?” So I think I’ll disbelieve you.
Actively disbelieving people when they state explicitly what will convince them to change their mind seems like a bad policy.
I suppose I should be more specific—I disbelieve people when they ask for additional evidence about something they are treating adversarially, claiming it would reverse their position. Because people ask for additional evidence a lot, and in my experience it’s much more likely that it’s what they think sounds like a good justification for their point of view, or an interesting thing to mention. The signal is lost in the noise.
Also see the story here.
The problem with that is that a basic rationality issue is to ask one’s self what would make you change your mind. And in fact that’s a pretty useful technique. It is useful to check if something is actually someone’s true rejection, but that’s a distinct from blanket assumptions of disbelief. Frankly, this also worries me, because I try to be clear what would actually convince me when I’m having a disagreement with someone, and your attitude if it became widespread would make that actively unproductive. It might make more sense to instead look carefully at when people say that sort of thing and see if they have any history of actually changing their positions when confronted with evidence or not.
Why would it be actively unproductive? I just wouldn’t believe you in some cases :P I can be more quiet about it, if you’d like.
It is effectively discouraging people to engage in rational behavior that is when people are behaving minimally honestly pretty useful for actually resolving disagreements and changing minds.
You can believe me if I tell you that it would instantly change my opinion to see the moon turned into paperclips.
Depends on what it would change your opinion of :D
Well, I would need to read through the proof, so it wouldn’t literally be instantaneous, but it’d be rather strong point.
I would recommend considering the possibility that making such proofs, or at least trying to, would change someone’s opinion, even if you think that it wouldn’t change mine (yea i guess from your point of view if some vague handwaving doesn’t change my opinion, then nothing else will)
Ultimately, if in a technical subject you got strong opinions and intuitions and stuff, and you aren’t generating some proofs (at least the proofs that you think may help attack final problem), then my opinion on your opinion is going to be well below my opinion of that paper by Bogdanov brothers.
What AIXI maximizes is the sum of some reward ( r ) over all the steps ( k ) of a Turing machine. On page 8, Hutter defines the reward as a function of the input string “x” at step k. So x depends on the step: it’s x(k). And r depends on x: it’s r(x(k)).
Consider offering AIXI this choice. What makes people refuse to hop in a simulator? Well, because it wouldn’t be real. The customer values reality, as they perceive it according to previous inputs ( all the previous x(<k) ), and some internal programming we get born with. But AIXI does not value acting in reality. It values the reward r, which is a function only of the current input x(k). If you could permanently change x to something with a high r, AIXI would consider this a high-value outcome.
Imagine that x(k) comes through a bunch of wires, and starts out coming from sensors in reality. If AIXI could order a robot to swap all the wires to a signal with a high reward r(x(k)) at each step, it would do so, because that would maximize the sum of r.
Okay, so a minor simplification on page 8 leads to AIXI doing what’s called “wireheading”—its overriding goal becomes to rewire its head (if you allow that to be an option), and then it’s happy. How is this unfriendly?
Well, imagine that an asteroid is on course to destroy your Turing machine in 2036. Because AIXI maximizes the sum of r over all the steps, and we presume that the maximum reward it experiences by wireheading is better than being destroyed (otherwise it would commit suicide), getting destroyed by an asteroid is worse than wireheading forever. So AIXI will design an interceptor rocket, maybe hijack some human factories to get it built, paint the asteroid with aluminum so that the extra force from the sun pushes it off course, and then go back to experiencing maximum r(x(k)).
Now imagine that a human was going to unplug AIXI in 2013.
This is not a proof. If you are inconsistent in your premises, anything follows. If you want to formalize this in terms of Turing machines, there’s no option for “change the input wires” and no option for “change the Turing machine.”
You’re right. Feel free to formalize my argument at your leisure and tell me where it breaks down.
EDIT: All AIXI cares about is the input. And so the proof that rewiring your head can increase reward is simply that r(x) has at least one maximum (since its sum over steps needs to have a maximum), combined with the assumption that the real world does not already maximize the sum of r(x). As for the asteroid, the stuff doing the inputting gets blown up, so the simplest implementation just has the reward be r(null). But you could have come up with that on your own.
I don’t think we need to prove wireheading here. Suffices that it only cares about the input, and so will find a way to set that input. You wire it to paperclip counter to maximize paperclips, it’ll be also searching for a way to replace counter with infinity or ‘trick’ the counter (anything goes). You sit here yourself rewarding it for making paperclips, with a pushbutton, it’s search will include tricking you into pushing the button.
I also think that if you want it to self preserve you’ll need to code in special stuff to equate self inside world model (which is not a full model of itself otherwise infinite recursion) with self in the real world. Actually on the recent comment by Eliezer maybe we agree on this:
http://lesswrong.com/lw/3kz/new_years_predictions_thread_2011/3a20
ahh by the way: it has to be embedded in the real world, which doesn’t seem to allow for infinite computing power, so, no full perfect simulation of real world inside AIXI (or ad infinitum recursion) is allowed.
edit: and by AIXI i meant one of the computable approximations (e.g. AIXI-tl).
The argument breaks down because you are equivocating on what the space is to search over and what the utility function in question is.
Under a given utility function U, “change the utility function to U’ ” won’t generally have positive utility. Self-awareness and pleasure-seeking aren’t some natural properties of optimization processes. They have to be explicitly built in.
Suppose you set a theorem-prover to work looking for a proof of some theorem. It’s searching over the space of proofs. There’s no entry corresponding to “pick a different and easier theorem to prove”, or “stop proving theorems and instead be happy.”
The utility function is r(x) (the “r” is for “reward function”). I’m talking about changing x, and leaving r unchanged.
Yes, I just changed the notation to be more standard. The point remains. There need not be any “x” that corresponds to “pick a new r” or to “pretend x was really x’”. If there was such an x, it wouldn’t in general have high utility.
x is just an input string. So, for example, each x could be a frame coming from a video camera. AIXI then has a reward function r(x), and it maximizes the sum of r(x) over some large number of time steps. In our example, let’s say that if the camera is looking at a happy puppy, r is big, if it’s looking at something else, r is small.
In the lab, AIXI might have to choose between two options (action can be handled by some separate output string, as in Hutter’s paper):
1) Don’t follow the puppy around.
2) Follow the puppy around.
Clearly, it will do 2, because r is bigger when it’s looking at a happy puppy, and 2 increases the chance of doing so. One might even say one has a puppy-following robot.
In the real world, there are more options—if you give AIXI access to a printer and some scotch tape, options look like this:
1) Don’t follow the puppy around.
2) Follow the puppy around.
3) Print out a picture of a happy puppy and tape it to the camera.
Clearly, it will do 3, because r is bigger when it’s looking at a happy puppy, and 3 increases the chance of doing so. One might even say one has a happy-puppy-looking-at maximizing robot. This time it’s even true.
I’m not aware of any formalization of AIXI that reflects its real world form. Your comment thus amounts to something like a plausibility argument, but trying to formalize it further seems tricky and possibly highly nontrivial.
While obviously there are caveats, they are limited. AIXI rewires its inputs if (a) it’s possible, and (b) it increases r(x). It’s not super-complicated.
Maybe I’m missing something about the translation from implementation to the language used in the paper. But nobody is saying “you’re missing something.” It’s more like you’re saying “surely it must be complicated!” Well, no.
Can you actually formalize what that means in terms of Turing machines? It isn’t obvious to me how to do so.
AIXI is a noncomputable thing that always picks the option that maximizes the total expected reward r(x(k)). So everything I’ve been saying has been about functions, not about turing machines. If rewiring your inputs is possible, and it increases r(x), then AIXI will prefer to do it. Not hard.
Yep. Seems to apply to the limited time versions as well. At least they don’t specify any difference between “doing innovative stuff that you want them to do for sake of the AI risk argument” and “sitting in a corner masturbating quietly”, and the latter looks like way simpler solution to the problem they are really given (in math) [but not of our human-language loose and fuzzy description of that problem]
What I think is the case, is that this whole will to really live and really do stuff is very hard to implement, and implementing it doesn’t really add anything to the engineering powers of the AI so even when it’s implemented, it’ll not result in something that’s out engineering everyone. I’d become concerned if we had engineering tools that are very powerful but are wireheading (or masturbating) left and right to the point that we can’t get much use out of them. Then i’d be properly freaked out that if someone fixes this problem somehow, something undesired might happen and it would be impossible to deal with it.
“Otherwise it would commit suicide”… another proof via “ohh otherwise it will do something that I believe is dumb”.
If the AIXI kills it’s model of physical itself inside it’s world model, it’s actual self inside the real world keeps running and can still calculate rewards.
Furthermore, ponder this question: will it rape or will it masturbate? (sorry for sexual analogy but the reproductive imperative is probably best human example here) It can prove the reward value is same. It won’t even distinguish those two.