One of the many reasons that I will win my bet with Eliezer is that it is impossible for an AI to understand itself.
That is just… trivially false.
If it could, it would be able to predict it’s own actions, and this is a logical contradiction, just as it is for us.
And that is the worst reasoning I have encountered in at least a week. Not only is it trying to foist a nonsensical definition ‘understand’, an AI could predict it’s own actions. AND even if it couldn’t it still wouldn’t be a logical contradiction. It’d just be a fact.
An AI could not predict its own actions, because any intelligent agent is quite capable of implementing the algorithm: “Take the predictor’s predicted action. Do the opposite.”
In order to predict itself (with 100% accuracy), it would have to be able to emulate its own programming, and this would cause a never-ending loop. Thus this is impossible.
An AI could not predict its own actions, because any intelligent agent is quite capable of implementing the algorithm: “Take the predictor’s predicted action. Do the opposite.”
Ok. And why would your AI decide to do so? You seem to be showing that a sufficiently pathological AI won’t be able to predict its own actions. How this shows that other AIs won’t be able to predict their own actions within some degree of certainty seems off.
This isn’t pathological. For example, it is a logical contradiction for someone to predict my actions in advance (and tell me about it), because my “programming” will lead me to do something else, much like the above algorithm. This is a feature, not a bug. Being able to be predicted is a great weakness. Any well programmed AI will avoid this weakness, just as we do.
Being able to be predicted is absolutely vital for making credible threats and promises. And, along with being able to accurately predict, allows for cooperation with other rational agents.
There appears to be a lot of logic here that is happening implicitly because I’m not following you.
You wrote:
An AI could not predict its own actions, because any intelligent agent is quite capable of implementing the algorithm: “Take the predictor’s predicted action. Do the opposite.”
Now, this seems like a very narrow sort of AI that would go and then do something else against what was predicted.
For example, it is a logical contradiction for someone to predict my actions in advance (and tell me about it), because my “programming” will lead me to do something else, much like the above algorithm.
You seem to be using “logical contradiction” in a non-standard fashion. Do you mean it won’t happen given how your mind operates? In that case, permit me to make a few predictions about your actions over the next 48 hours (that you could probably predict also): 1) You will sleep at some point in that time period. 2) You will eat at some point in that time period. I make both of those with probability around .98 each. If we extend to one month I’m willing to make a similar confidence prediction that you will make a phonecall or check your email within in that time. I’m pretty sure you are not going to go out of your way as a result of these predictions to try to go do something else.
You also seem to be missing the point about what an AI would actually need to improve. Say for example that the AI has a subroutine for factoring integers. If it comes up with a better algorithm for factoring integers, it can replace the subroutine with the new one. It doesn’t need to think deeply about how this will alter behavior.
I agree with those predictions. However, my point would become clear if you attempted to translate your probability of 0.98 into a bet with me, with me betting $100 and you betting $5000. I would surely win the bet (with at least a probability of 0.98).
I agree with those predictions. However, my point would become clear if you attempted to translate your probability of 0.98 into a bet with me, with me betting $100 and you betting $5000. I would surely win the bet (with at least a probability of 0.98).
No it wouldn’t because that’s a very different situation. My probability estimate for you not eating food in a 48 hour period if you get paid $5000 when you succeed and must pay $100 if you fail is much lower. If I made the bet with some third party I’d be perfectly willing to do so as long as I had some reassurance that the third party isn’t intending to pay you a large portion of the resulting winnings if you win.
I don’t find predictability a weakness. If someone says to me, “Hey, Alicorn, I predict you’re going to eat that sandwich you’re holding,” I’m going to say, “Yes. You are exactly right. And I’m glad you are! If you were wrong, then I wouldn’t get to eat this delicious sandwich, which I want (that being why I made it and picked it up).”
Did you have some other, less general sort of predictability in mind when you made the claim that it’s a weakness?
Why? Predicting my actions doesn’t make them actions I don’t want to take. Predicting I’ll eat a sandwich if I want one doesn’t hurt me; and if others can predict that I’ll cooperate on the prisoner’s dilemma iff my opponent will cooperate iff I’ll cooperate, so much the better for all concerned.
Can you give an example of a case where being predictable would hurt someone who goes about choosing actions well in the first place? Note that, as with the PD thing above, actions are dependent on context; if the prediction changes the context, then that will already be factored into an accurate prediction.
Can you give an example of a case where being predictable would hurt someone who goes about choosing actions well in the first place?
Good question. Your intuition is correct as long as your actions are chosen “optimally” in the game-theoretic sense. This is one of the ideas behind Nash equilibria: your opponent can’t gain anything from knowing your strategy and vice versa. A caveat is that the Nash equilibria of many games require “mixed strategies” with unpredictable randomizing, so if the opponent can predict the output of your random device, you’re in trouble.
If you can accurately predict the action of a chess player faster than they can make it, then you have more time to think about your response. There are cases where this can make a difference—even if they happen to play perfectly.
Alicorn, your note about the PD implies that it is universally the case that there is some one action that will benefit you even if others predict it. There is no reason to think that this is the case; and if there is even one instance where doing what others predict you will do is harmful, then being universally predictable is a weakness.
For example, it is a logical contradiction for someone to predict my actions in advance (and tell me about it),
Again, this is not a logical contradiction. You do not have a clear understanding of what the concept entails.It doesn’t mean ‘sometimes impractical’ or ‘often people adapt to avoid it’.
No, this really would be a logical contradiction if the agent being predicted does implement the stated algorithm (and won’t override it when something more important is at stake). It just has nothing to do with self-improvement, for which predicting abstract properties of specific algorithms is what matters; much like Rice’s theorem doesn’t mean we can’t prove that specific programs output pi (e.g.).
No, this really would be a logical contradiction if the agent being predicted does implement the stated algorithm
No, it is not a logical contradiction. The fact that someone can implement a stupid algorithm does not make the claim “it is a logical contradiction for someone to predict my actions in advance and tell me about it”. Just because someone could implement a stupid algorithm for decision making or a naive algorithm for prediction (don’t know when to shut up) doesn’t mean you can make that general claim. Not even close.
Your argument would probably apply if I were refuting a different but somewhat related assertion.
No, it is not a logical contradiction. The fact that someone can implement a stupid algorithm does not make the claim “it is a logical contradiction for someone to predict my actions in advance and tell me about it”. Just because someone could implement a stupid algorithm for decision making or a naive algorithm for prediction (don’t know when to shut up) doesn’t mean you can make that general claim. Not even close.
It does mean you can make a general claim analogous to Rice’s theorem / the undecidability of the halting problem — not that such a claim is incredibly interesting for our purposes.
Your argument would probably apply if I were refuting a different but somewhat related assertion.
Point taken; it doesn’t seem like we actually disagree about anything.
It does mean you can make a general claim analogous to Rice’s theorem / the undecidability of the halting problem — not that such a claim is incredibly interesting for our purposes.
The cache of this conversation is buried somewhat in my brain but I think there is something to what you say here.
But an AI with that programming is predictable, and, much worse, manipulable! In order to get it to do anything, you need only inform it that you predicted that it will not do that thing*. It’s just a question of how long it takes people to realize that it has this behavior. It is far weaker than an AI that sometimes behaves as predicted and sometimes does not. Consider e.g. Alicorn’s sandwich example; if we imagine an AI that needed to eat (a silly idea but demonstrates the point), you don’t want it to refuse to do so simply because someone predicted it will (which anyone easily could).
*This raises the question of whether the AI will realize that in fact you are secretly predicting that it will do the opposite. But once you consider that then the AI has to keep track of probabilities of what people’s true (rather than just claimed) predictions are, I think it becomes clear that this is just a silly thing to be implementing in the first place. Especially because even if people didn’t go up to it and say “I bet you’re going to try to keep yourself alive”, they would still be implicitly predicting it by expecting it.
But once you have played a couple of games of ‘paper, scissors rock’ I think it becomes clear that this is just a silly thing to be implementing in the first place.
That is just… trivially false.
And that is the worst reasoning I have encountered in at least a week. Not only is it trying to foist a nonsensical definition ‘understand’, an AI could predict it’s own actions. AND even if it couldn’t it still wouldn’t be a logical contradiction. It’d just be a fact.
An AI could not predict its own actions, because any intelligent agent is quite capable of implementing the algorithm: “Take the predictor’s predicted action. Do the opposite.”
In order to predict itself (with 100% accuracy), it would have to be able to emulate its own programming, and this would cause a never-ending loop. Thus this is impossible.
Ok. And why would your AI decide to do so? You seem to be showing that a sufficiently pathological AI won’t be able to predict its own actions. How this shows that other AIs won’t be able to predict their own actions within some degree of certainty seems off.
This isn’t pathological. For example, it is a logical contradiction for someone to predict my actions in advance (and tell me about it), because my “programming” will lead me to do something else, much like the above algorithm. This is a feature, not a bug. Being able to be predicted is a great weakness. Any well programmed AI will avoid this weakness, just as we do.
Being able to be predicted is absolutely vital for making credible threats and promises. And, along with being able to accurately predict, allows for cooperation with other rational agents.
There appears to be a lot of logic here that is happening implicitly because I’m not following you.
You wrote:
Now, this seems like a very narrow sort of AI that would go and then do something else against what was predicted.
You seem to be using “logical contradiction” in a non-standard fashion. Do you mean it won’t happen given how your mind operates? In that case, permit me to make a few predictions about your actions over the next 48 hours (that you could probably predict also): 1) You will sleep at some point in that time period. 2) You will eat at some point in that time period. I make both of those with probability around .98 each. If we extend to one month I’m willing to make a similar confidence prediction that you will make a phonecall or check your email within in that time. I’m pretty sure you are not going to go out of your way as a result of these predictions to try to go do something else.
You also seem to be missing the point about what an AI would actually need to improve. Say for example that the AI has a subroutine for factoring integers. If it comes up with a better algorithm for factoring integers, it can replace the subroutine with the new one. It doesn’t need to think deeply about how this will alter behavior.
I agree with those predictions. However, my point would become clear if you attempted to translate your probability of 0.98 into a bet with me, with me betting $100 and you betting $5000. I would surely win the bet (with at least a probability of 0.98).
I am willing to bet, at 10,000 to 1 odds, that you will sleep sometime in the next 2 weeks. The pay out on this bet is not transferable to your heirs.
No it wouldn’t because that’s a very different situation. My probability estimate for you not eating food in a 48 hour period if you get paid $5000 when you succeed and must pay $100 if you fail is much lower. If I made the bet with some third party I’d be perfectly willing to do so as long as I had some reassurance that the third party isn’t intending to pay you a large portion of the resulting winnings if you win.
I don’t find predictability a weakness. If someone says to me, “Hey, Alicorn, I predict you’re going to eat that sandwich you’re holding,” I’m going to say, “Yes. You are exactly right. And I’m glad you are! If you were wrong, then I wouldn’t get to eat this delicious sandwich, which I want (that being why I made it and picked it up).”
Did you have some other, less general sort of predictability in mind when you made the claim that it’s a weakness?
It is only universal predictability that is a weakness.
Why? Predicting my actions doesn’t make them actions I don’t want to take. Predicting I’ll eat a sandwich if I want one doesn’t hurt me; and if others can predict that I’ll cooperate on the prisoner’s dilemma iff my opponent will cooperate iff I’ll cooperate, so much the better for all concerned.
Can you give an example of a case where being predictable would hurt someone who goes about choosing actions well in the first place? Note that, as with the PD thing above, actions are dependent on context; if the prediction changes the context, then that will already be factored into an accurate prediction.
Good question. Your intuition is correct as long as your actions are chosen “optimally” in the game-theoretic sense. This is one of the ideas behind Nash equilibria: your opponent can’t gain anything from knowing your strategy and vice versa. A caveat is that the Nash equilibria of many games require “mixed strategies” with unpredictable randomizing, so if the opponent can predict the output of your random device, you’re in trouble.
If you can accurately predict the action of a chess player faster than they can make it, then you have more time to think about your response. There are cases where this can make a difference—even if they happen to play perfectly.
Alicorn, your note about the PD implies that it is universally the case that there is some one action that will benefit you even if others predict it. There is no reason to think that this is the case; and if there is even one instance where doing what others predict you will do is harmful, then being universally predictable is a weakness.
Again, this is not a logical contradiction. You do not have a clear understanding of what the concept entails.It doesn’t mean ‘sometimes impractical’ or ‘often people adapt to avoid it’.
No, this really would be a logical contradiction if the agent being predicted does implement the stated algorithm (and won’t override it when something more important is at stake). It just has nothing to do with self-improvement, for which predicting abstract properties of specific algorithms is what matters; much like Rice’s theorem doesn’t mean we can’t prove that specific programs output pi (e.g.).
No, it is not a logical contradiction. The fact that someone can implement a stupid algorithm does not make the claim “it is a logical contradiction for someone to predict my actions in advance and tell me about it”. Just because someone could implement a stupid algorithm for decision making or a naive algorithm for prediction (don’t know when to shut up) doesn’t mean you can make that general claim. Not even close.
Your argument would probably apply if I were refuting a different but somewhat related assertion.
It does mean you can make a general claim analogous to Rice’s theorem / the undecidability of the halting problem — not that such a claim is incredibly interesting for our purposes.
Point taken; it doesn’t seem like we actually disagree about anything.
The cache of this conversation is buried somewhat in my brain but I think there is something to what you say here.
But an AI with that programming is predictable, and, much worse, manipulable! In order to get it to do anything, you need only inform it that you predicted that it will not do that thing*. It’s just a question of how long it takes people to realize that it has this behavior. It is far weaker than an AI that sometimes behaves as predicted and sometimes does not. Consider e.g. Alicorn’s sandwich example; if we imagine an AI that needed to eat (a silly idea but demonstrates the point), you don’t want it to refuse to do so simply because someone predicted it will (which anyone easily could).
*This raises the question of whether the AI will realize that in fact you are secretly predicting that it will do the opposite. But once you consider that then the AI has to keep track of probabilities of what people’s true (rather than just claimed) predictions are, I think it becomes clear that this is just a silly thing to be implementing in the first place. Especially because even if people didn’t go up to it and say “I bet you’re going to try to keep yourself alive”, they would still be implicitly predicting it by expecting it.
Yes, that as well. Such an AI would, it seems offhand, be playing a perpetual game of Poisoned Chalice Switcheroo to no real end.