There are lots of ordinary examples in game theory of time inconsistent choices. Once you know how to resolve them, then if you can’t use those approaches to resolve this I might be convinced that anthropic updating is at fault. But until then I think you are making a huge leap to blame anthropic updating for the time inconsistent choices.
Robin, you’re jumping into the middle of a big extended discussion. We’re not only blaming anthropic updating, we’re blaming Bayesian updating in general, and proposing a decision theory without it (Updateless Decision Theory, or UDT). The application to anthropic reasoning is just that, an application.
UDT seems to solve all cases of time inconsistency in decision problems with one agent. What UDT agents do in multi-player games is still an open problem that we’re working on. There was an extensive discussion about it in the previousthreads if you want to see some of the issues involved. But the key ingredient that is missing is a theory of logical uncertainty, that tells us how different agents (or more generally, computational processes) are logically correlated to each other.
The ordinary time inconsistencies in game theory are all regarding multiple agents. Seems odd to suggest you’ve solved the problem except for those cases.
Not exactly the way I would phrase it, but Timeless Decision Theory and Updateless Decision Theory between them have already killed off a sufficiently large number of time inconsistencies that treating any remaining ones as a Problem seems well justified. Yes, we have solved all ordinary dynamic inconsistencies of conventional game theory already!
Let’s take the simple case of time inconsistency regarding punishment. There is a two stage game with two players. First A decides if to cheat B for some gain. Then B decides if to punish A at some cost. Before the game B would like to commit to punishing A if A cheats, but once A has already cheated, B would rather not punish.
In UDT, we blame this time inconsistency on B’s updating on A having cheated (i.e. treating it as a fact that can no longer be altered). Suppose it’s common knowledge that A can simulate or accurately predict B, then B should reason that by deciding to punish, it increases the probability that A would have predicted that B would punish and thus decreases the probability that A would have cheated.
But the problem is not fully solved, because A could reason the same way, and decide to cheat no matter what it predicts that B does, in the expectation that B would predict this and see that it’s pointless to punish.
So UDT seems to eliminate time-inconsistency, but at the cost of increasing the number of possible outcomes, essentially turning games with sequential moves into games with simultaneous moves, with the attendant increase in the number of Nash equilibria. We’re trying to work out what to do about this.
So UDT seems to eliminate time-inconsistency, but at the cost of increasing the number of possible outcomes, essentially turning games with sequential moves into games with simultaneous moves, with the attendant increase in the number of Nash equilibria. We’re trying to work out what to do about this.
Er, turning games with sequential moves into games with simultaneous moves is standard in game theory, and “never cheat, always punish cheating” and “always cheat, never punish” are what are considered the Nash equilibria of that game in standard parlance. [ETA: Well, “never cheat, punish x% of the time” will also be a NE for large enough x.] It is subgame perfect equilibrium that rules out “never cheat, always punish cheating” (the set of all SPE of a sequential game is a subset of the set of all NE of that game).
Yeah, I used the wrong terminology in the grandparent comment. I guess the right way to put it is that SPE/backwards induction no longer seems reasonable under UDT and it’s unclear what can take its place, as far as reducing the number of possible solutions to a given game.
It is subgame perfect equilibrium that rules out “never cheat, always punish cheating” (the set of all SPE of a sequential game is a subset of the set of all NE of that game).
How strictly do you (or the standard approach) mean to rule out options that aren’t good on all parts of the game? It seems like sometimes you do want to do things that are subgame suboptimal.
Edit: or at least be known to do things, which unfortunately can require actually being prepared to do the things.
Well, the classical game theorist would reply that they’re studying one-off games, in which the game you’re currently playing doesn’t affect any payoff you get outside that game (otherwise that should be made part of the game), so you can’t be doing the punishment because you want to be known to be a punisher, or the game that Robin specified doesn’t model the situation you’re in. The classical game theorist assumes you can’t look into people’s heads, so whatever you say or do before the cheating, you’re always free to not punish during the punishment round (as you’re undoubtedly aware, mutual checking of source code is prohibited by antitrust laws in over 185 countries).
The classical game theorist would further point out that if you do want model that punishment helps you be known as a punisher, then you should use their theory of repeated games, where they have some folk theorems for you saying that lots and lots of things can be Nash equilibria e.g. in a game where after each round there is a fixed probability of another round; for example, cooperation in the prisoner’s dilemma, but also all sorts of suboptimal outcomes (which become Nash equilibria because any deviator gets punished as badly as the other players can punish them).
I should point out that not all classical game theorists think that SPE makes particularly good predictions, though; I’ve read someone say, I think Binmore, that you expect to virtually always see a NE in the laboratory after a learning period, but not an SPE, and that the original inventor of SPE actually came up with it as an example of what you would not expect to see in the lab, or something to that tune. (Sorry, I should really chase down that reference, but I don’t have time right now. I’ll try to remember to do that later. ETA: Ok, Binmore and Shaked, 2010: Experimental Economics: Where Next?Journal of Economic Behavior & Organization, 73: 87-100. See the stuff about backward induction, starting at the bottom on p.88. The inventor of SPE is Reinhard Selten, and the claim is that he didn’t believe it would predict what you see it in the lab and “[i]t was to demonstrate this fact that he encouraged Werner Güth (...) to carry out the very first experiment on the Ultimatum game”, not that he invented SPE for this purpose.)
so whatever you say or do before the cheating, you’re always free to not punish during the punishment round
Interesting. This idea, used as an argument for SPE, seems to be the free will debate intruding into decision theory. “Only some of these algorithms have freedom, and others don’t, and humans are free, so they should behave like the free algorithms.” This either ignores, or accepts, the fact that the “free” algorithms are just as deterministic as the “unfree” algorithms. (And it depends on other stuff, but that’s not the fun bit)
(as you’re undoubtedly aware, mutual checking of source code is prohibited by antitrust laws in over 185 countries).
Hm, I may not quite have gotten the point across: I think you may be thinking of the argument that humans have free will, so they can’t force future versions of themselves to do something that would be against that future version’s given its information, but that isn’t the argument I was trying to explain. The idea I was refering to works precisely the same way with deterministic algorithms, as long as the players only get to observe each others’ actions, not each others’ source (though of course its proponents don’t think in those terms). The point is that if the other player looks at you severely and suggestively taps their baseball bat and tells you about how they’ve beaten up people who have defected in the past, that still doesn’t mean that they’re actually going to beat you up—since if such threats were effective on you, then making them would be the smart thing to do even if the other player has no intention of actually beating you up (and risk going to jail) if for some reason you end up defecting. (Compare AI-in-the-box...) (Of course, this argument only works if you’re reasonably sure that the other player is a classical game theorist; if you think you might be playing against someone who will, “irrationally”, actually punish you, like a timeless decision theorist, then you should not defect, and they won’t have to punish you...)
Now, if you had actual information about what this player had done in similar situations in the past, like police reports of beaten-up defectors, this argument wouldn’t work, but then (the standard argument continues) you have the wrong game-theoretical model; the correct model includes all of the punisher’s previous interactions, and in that game, it might well be a SPE to punish. (Though only if the exact number of “rounds” is not certain, for the same reason as in the finitely iterated Prisoner’s Dilemma: in the last round the punisher has no more reason to punish because there are no future targets to impress, so you defect no matter what they did in previous rounds, so they have no reason to punish in the second-to-last round, etc.)
I think you may be thinking of the argument that humans have free will, so they can’t force future versions of themselves to do something that would be against that future version’s given its information
That is not what I was thinking of. Here, let me re-quote the whole sentence:
The classical game theorist assumes you can’t look into people’s heads, so whatever you say or do before the cheating, you’re always free to not punish during the punishment round
The funny implication here is that if someone did look into your head, you would no longer be “free.” Like a lightswitch :P And then if they erased their memory of what they saw, you’re free again. Freedom on, freedom off.
And though that is a fine idea to define, to mix it up with an algorithmic use of “freedom” seems to just be used to argue “by definition.”
Ok, sorry I misread you. “Free” was just my word rather than part of the standard explanation, so alas we don’t have anybody we can attribute that belief to :-)
(The difficulty arises if UDT B reasons logically that there should not logically exist any copies of its current decision process finding themselves in worlds where A is dependent on its own decision process, and yet A defects. I’m starting to think that this resembles the problem I talked about earlier, where you have to use Omega’s probability distribution in order to agree to be Counterfactually Mugged on problems that Omega expects to have a high payoff. Namely, you may have to use A’s logical uncertainty, rather than your own logical uncertainty, in order to perceive a copy of yourself inside A’s counterfactual. This is a complicated issue and I may have to post about it in order to explain it properly.)
Drescher-Nesov-Dai UDT solves this (that is, goes ahead and punishes the cheater, making the same decision at both times).
TDT can handle Parfit’s Hitchhiker—pay for the ride, make the same decision at both times, because it forms the counterfactual “If I did not pay, I would not have gotten the ride”. But TDT has difficulty with this particular case, since it implies that B’s original belief that A would not cheat if punished, was wrong; and after updating on this new information, B may no longer have a motive to punish. (UDT of course does not update.) Since B’s payoff can depend on B’s complete strategy tree including decisions that would be made under other conditions, instead of just depending on the actual decision made under real conditions, this scenario is outside the realm where TDT is guaranteed to maximize.
How transparent/translucent are the agents? I.e. can A examine B’s sourcecode, or use observational and other data to assess B’s decision procedure? If not, what is A’s prior probability distribution for decision procedures B might be using?
Are both A and B using the same decision theory, TDT/UDT? Or is A using CDT and B using TDT/UDT or vice versa?
Clearly B has mistaken beliefs about either A or its own dispositions; otherwise B would not have dealt with A in the interaction where A ended up cheating. If B uses UDT (and hence will carry through punishments), and A uses any DT that correctly forecasts B’s response to cheating, then A should not in fact cheat. If A cheats anyway, though, B still punishes.
Actually, on further reflection, it’s possible that B would reason that it is logically impossible for A to have the specified dependency on B’s decision, and yet for A to still end up defecting, in which case even UDT might end up in trouble—it would be a transparent logical impossibility for A to defect if B’s beliefs about A are true, so it’s not clear that B would handle the event correctly. I’ll have to think about this.
If there is some probability of A cheating even if B precommits to punishment, but with odds in B’s favor, the situation where B needs to implement punishment is quite possible (expected). Likewise, if B precommiting to punish A is predicted to lead to an even worse outcome than not punishing (because of punishment expenses), UDT B won’t punish A. Futhermore, a probability of cheating and not-punishment of cheating (mixed strategies, possibly on logical uncertainty to defy the laws of the game if pure strategies are required) is a mechanism through which the players can (consensually) bargain with each other in the resulting parallel game, an issue Wei Dai mentioned in the other reply. B doesn’t need absolute certainty at any stage, in both cases.
Also, in UDT there are no logical certainties, as it doesn’t update on logical conclusions as well.
If there is some probability of A cheating even if B precommits to punishment
Sure, but that’s the convenient setup. What if for A to cheat means that you necessarily just mistaken about which algorithm A runs?
Also, in UDT there are no logical certainties, as it doesn’t update on logical conclusions as well.
UDT will be logically certain about some things but not others. If UDT B “doesn’t update” on its computation about what A will do in response to B, it’s going to be in trouble.
What if for A to cheat means that you necessarily just mistaken about which algorithm A runs?
A decision algorithm should never be mistaken, only uncertain.
UDT will be logically certain about some things but not others. If UDT B “doesn’t update” on its computation about what A will do in response to B, it’s going to be in trouble.
“Doesn’t update” doesn’t mean that it doesn’t use the info (but you know that, so what do you mean?). A logical conclusion can be a parameter in a strategy, without making the algorithm unable to reason about what it would be like if the conclusion was different, that is basically about uncertainty of same algorithm in other states of knowledge.
What is the remaining Problem that you’re referring to? Why can’t we apply the formalism of UDT1 to the various examples people seem to be puzzled about and just get the answers out? Or is cousin_it right about the focus having shifted to how human beings ought to reason about these problems?
The anthropic problem was a remaining problem for TDT, although not UDT.
UDT has its own problems, possibly. For example, in the Counterfactual Mugging, it seems that you want to be counterfactually mugged whenever Omega has a well-calibrated distribution and has a systematic policy of offering high-payoff CMs according to that distribution, even if your own prior has a different distribution. In other words, the key to the CM isn’t your own distribution, it’s Omega’s. And it’s not possible to interpret UDT as epistemic advice, which leaves anthropic questions open. So I haven’t yet shifted to UDT outright.
(The reason I did not answer your question earlier was that it seemed to require a response at greater length than the above.)
Well, you’re right in the sense that I can’t understand the example you gave. (I waited a couple of days to see if it would become clear, but it didn’t) But the rest of the response is helpful.
Did he ever get around to explaining this in more detail? I don’t remember reading a reply to this, but I think I’ve just figured out the idea: Suppose you get word that Omega is coming to the neighbourhood and going to offer counterfactual muggings. What sort of algorithm do you want to self-modify into? You don’t know what CMs Omega is going to offer; all you know is that it will offer odds according to its well-calibrated prior. Thus, it has higher expected utility to be a CM-accepter than a CM-rejecter, and even a CDT agent would want to self-modify.
I don’t think that’s a problem for UDT, though. What UDT will compute when asked to pay is the expected utility under its prior of paying up when Omega asks it to; thus, the condition for UDT to pay up is NOT
prior probability of heads * Omega's offered payoff > prior of tails * Omega's price
but
prior of (heads and Omega offers a CM for this coin) * payoff > prior of (tails and CM) * price.
In other words, UDT takes the quality of Omega’s predictions into account and acts as if updating on them (the same way you would update if Omega told you who it expects to win the next election, at 98% probability).
CDT agents, as usual, will actually want to self-modify into a UDT agent whose prior equals the CDT agent’s posterior [ETA: wait, sorry, no, they won’t act as if they can acausally control other instances of the same program, but they will self-modify so as to make future instances of themselves (which obviously they control causally) act in a way that maximizes EU according to the agent’s present posterior, and that’s what we need here], and will use the second formula above accordingly—they don’t want to be a general CM-rejecter, but they think that they can do even better than being a general CM-accepter if they refuse to pay up if at the time of self-modification they assigned low probability to tails, even conditional on Omega offering them a CM.
He never explained further, and actually I still don’t quite understand the example even given your explanation. Maybe you can reply directly to Eliezer’s comment so he can see it in his inbox, and let us know if he still thinks it’s a problem for UDT?
But the key ingredient that is missing is a theory of logical uncertainty, that tells us how different agents (or more generally, computational processes) are logically correlated to each other.
I’d look for it as logical theory of concurrency and interaction: “uncertainty” fuzzifies the question.
I’d look for it as logical theory of concurrency and interaction: “uncertainty” fuzzifies the question.
Why? For me, how different agents are logically correlated to each other seems to be the same type of question as “what probability (if any) should I assign to P!=NP?” Wouldn’t the answer fall out of a general theory of logical uncertainty? (ETA: Or at least be illuminated by such a theory?)
Logic is already in some sense about uncertainty (e.g. you could interpret predicates as states of knowledge). When you add one more “uncertainty” of some breed, it leads to perversion of logic, usually of applied character and barren meaning.
The concept of “probability” is suspect, I don’t expect it to have foundational significance.
So what would you call a field that deals with how one ought to make bets involving P!=NP (i.e., mathematical statements that we can’t prove to be true or false), if not “logical uncertainty”? Just “logic”? Wouldn’t that cause confusion in others, since today it’s usually understood that such questions are outside the realm of logic?
I don’t understand how to make such bets, except in a way it’s one of the kinds of human decision-making that can be explicated in terms of priors and utilities. The logic of this problem is in the process that works with the statement, which is in the domain of proof theory.
There are lots of ordinary examples in game theory of time inconsistent choices. Once you know how to resolve them, then if you can’t use those approaches to resolve this I might be convinced that anthropic updating is at fault. But until then I think you are making a huge leap to blame anthropic updating for the time inconsistent choices.
Robin, you’re jumping into the middle of a big extended discussion. We’re not only blaming anthropic updating, we’re blaming Bayesian updating in general, and proposing a decision theory without it (Updateless Decision Theory, or UDT). The application to anthropic reasoning is just that, an application.
UDT seems to solve all cases of time inconsistency in decision problems with one agent. What UDT agents do in multi-player games is still an open problem that we’re working on. There was an extensive discussion about it in the previous threads if you want to see some of the issues involved. But the key ingredient that is missing is a theory of logical uncertainty, that tells us how different agents (or more generally, computational processes) are logically correlated to each other.
The ordinary time inconsistencies in game theory are all regarding multiple agents. Seems odd to suggest you’ve solved the problem except for those cases.
I was referring to problems like Newcomb’s Problem, Counterfactual Mugging, Sleeping Beauty, and Absentminded Driver.
Not exactly the way I would phrase it, but Timeless Decision Theory and Updateless Decision Theory between them have already killed off a sufficiently large number of time inconsistencies that treating any remaining ones as a Problem seems well justified. Yes, we have solved all ordinary dynamic inconsistencies of conventional game theory already!
Let’s take the simple case of time inconsistency regarding punishment. There is a two stage game with two players. First A decides if to cheat B for some gain. Then B decides if to punish A at some cost. Before the game B would like to commit to punishing A if A cheats, but once A has already cheated, B would rather not punish.
In UDT, we blame this time inconsistency on B’s updating on A having cheated (i.e. treating it as a fact that can no longer be altered). Suppose it’s common knowledge that A can simulate or accurately predict B, then B should reason that by deciding to punish, it increases the probability that A would have predicted that B would punish and thus decreases the probability that A would have cheated.
But the problem is not fully solved, because A could reason the same way, and decide to cheat no matter what it predicts that B does, in the expectation that B would predict this and see that it’s pointless to punish.
So UDT seems to eliminate time-inconsistency, but at the cost of increasing the number of possible outcomes, essentially turning games with sequential moves into games with simultaneous moves, with the attendant increase in the number of Nash equilibria. We’re trying to work out what to do about this.
Er, turning games with sequential moves into games with simultaneous moves is standard in game theory, and “never cheat, always punish cheating” and “always cheat, never punish” are what are considered the Nash equilibria of that game in standard parlance. [ETA: Well, “never cheat, punish x% of the time” will also be a NE for large enough x.] It is subgame perfect equilibrium that rules out “never cheat, always punish cheating” (the set of all SPE of a sequential game is a subset of the set of all NE of that game).
Yeah, I used the wrong terminology in the grandparent comment. I guess the right way to put it is that SPE/backwards induction no longer seems reasonable under UDT and it’s unclear what can take its place, as far as reducing the number of possible solutions to a given game.
How strictly do you (or the standard approach) mean to rule out options that aren’t good on all parts of the game? It seems like sometimes you do want to do things that are subgame suboptimal.
Edit: or at least be known to do things, which unfortunately can require actually being prepared to do the things.
Well, the classical game theorist would reply that they’re studying one-off games, in which the game you’re currently playing doesn’t affect any payoff you get outside that game (otherwise that should be made part of the game), so you can’t be doing the punishment because you want to be known to be a punisher, or the game that Robin specified doesn’t model the situation you’re in. The classical game theorist assumes you can’t look into people’s heads, so whatever you say or do before the cheating, you’re always free to not punish during the punishment round (as you’re undoubtedly aware, mutual checking of source code is prohibited by antitrust laws in over 185 countries).
The classical game theorist would further point out that if you do want model that punishment helps you be known as a punisher, then you should use their theory of repeated games, where they have some folk theorems for you saying that lots and lots of things can be Nash equilibria e.g. in a game where after each round there is a fixed probability of another round; for example, cooperation in the prisoner’s dilemma, but also all sorts of suboptimal outcomes (which become Nash equilibria because any deviator gets punished as badly as the other players can punish them).
I should point out that not all classical game theorists think that SPE makes particularly good predictions, though; I’ve read someone say, I think Binmore, that you expect to virtually always see a NE in the laboratory after a learning period, but not an SPE, and that the original inventor of SPE actually came up with it as an example of what you would not expect to see in the lab, or something to that tune. (Sorry, I should really chase down that reference, but I don’t have time right now. I’ll try to remember to do that later. ETA: Ok, Binmore and Shaked, 2010: Experimental Economics: Where Next? Journal of Economic Behavior & Organization, 73: 87-100. See the stuff about backward induction, starting at the bottom on p.88. The inventor of SPE is Reinhard Selten, and the claim is that he didn’t believe it would predict what you see it in the lab and “[i]t was to demonstrate this fact that he encouraged Werner Güth (...) to carry out the very first experiment on the Ultimatum game”, not that he invented SPE for this purpose.)
Interesting. This idea, used as an argument for SPE, seems to be the free will debate intruding into decision theory. “Only some of these algorithms have freedom, and others don’t, and humans are free, so they should behave like the free algorithms.” This either ignores, or accepts, the fact that the “free” algorithms are just as deterministic as the “unfree” algorithms. (And it depends on other stuff, but that’s not the fun bit)
:D
Hm, I may not quite have gotten the point across: I think you may be thinking of the argument that humans have free will, so they can’t force future versions of themselves to do something that would be against that future version’s given its information, but that isn’t the argument I was trying to explain. The idea I was refering to works precisely the same way with deterministic algorithms, as long as the players only get to observe each others’ actions, not each others’ source (though of course its proponents don’t think in those terms). The point is that if the other player looks at you severely and suggestively taps their baseball bat and tells you about how they’ve beaten up people who have defected in the past, that still doesn’t mean that they’re actually going to beat you up—since if such threats were effective on you, then making them would be the smart thing to do even if the other player has no intention of actually beating you up (and risk going to jail) if for some reason you end up defecting. (Compare AI-in-the-box...) (Of course, this argument only works if you’re reasonably sure that the other player is a classical game theorist; if you think you might be playing against someone who will, “irrationally”, actually punish you, like a timeless decision theorist, then you should not defect, and they won’t have to punish you...)
Now, if you had actual information about what this player had done in similar situations in the past, like police reports of beaten-up defectors, this argument wouldn’t work, but then (the standard argument continues) you have the wrong game-theoretical model; the correct model includes all of the punisher’s previous interactions, and in that game, it might well be a SPE to punish. (Though only if the exact number of “rounds” is not certain, for the same reason as in the finitely iterated Prisoner’s Dilemma: in the last round the punisher has no more reason to punish because there are no future targets to impress, so you defect no matter what they did in previous rounds, so they have no reason to punish in the second-to-last round, etc.)
(BTW: reference added to grandparent.)
That is not what I was thinking of. Here, let me re-quote the whole sentence:
The funny implication here is that if someone did look into your head, you would no longer be “free.” Like a lightswitch :P And then if they erased their memory of what they saw, you’re free again. Freedom on, freedom off.
And though that is a fine idea to define, to mix it up with an algorithmic use of “freedom” seems to just be used to argue “by definition.”
Ok, sorry I misread you. “Free” was just my word rather than part of the standard explanation, so alas we don’t have anybody we can attribute that belief to :-)
(The difficulty arises if UDT B reasons logically that there should not logically exist any copies of its current decision process finding themselves in worlds where A is dependent on its own decision process, and yet A defects. I’m starting to think that this resembles the problem I talked about earlier, where you have to use Omega’s probability distribution in order to agree to be Counterfactually Mugged on problems that Omega expects to have a high payoff. Namely, you may have to use A’s logical uncertainty, rather than your own logical uncertainty, in order to perceive a copy of yourself inside A’s counterfactual. This is a complicated issue and I may have to post about it in order to explain it properly.)
Drescher-Nesov-Dai UDT solves this (that is, goes ahead and punishes the cheater, making the same decision at both times).
TDT can handle Parfit’s Hitchhiker—pay for the ride, make the same decision at both times, because it forms the counterfactual “If I did not pay, I would not have gotten the ride”. But TDT has difficulty with this particular case, since it implies that B’s original belief that A would not cheat if punished, was wrong; and after updating on this new information, B may no longer have a motive to punish. (UDT of course does not update.) Since B’s payoff can depend on B’s complete strategy tree including decisions that would be made under other conditions, instead of just depending on the actual decision made under real conditions, this scenario is outside the realm where TDT is guaranteed to maximize.
The case is underspecified:
How transparent/translucent are the agents? I.e. can A examine B’s sourcecode, or use observational and other data to assess B’s decision procedure? If not, what is A’s prior probability distribution for decision procedures B might be using?
Are both A and B using the same decision theory, TDT/UDT? Or is A using CDT and B using TDT/UDT or vice versa?
Clearly B has mistaken beliefs about either A or its own dispositions; otherwise B would not have dealt with A in the interaction where A ended up cheating. If B uses UDT (and hence will carry through punishments), and A uses any DT that correctly forecasts B’s response to cheating, then A should not in fact cheat. If A cheats anyway, though, B still punishes.
Actually, on further reflection, it’s possible that B would reason that it is logically impossible for A to have the specified dependency on B’s decision, and yet for A to still end up defecting, in which case even UDT might end up in trouble—it would be a transparent logical impossibility for A to defect if B’s beliefs about A are true, so it’s not clear that B would handle the event correctly. I’ll have to think about this.
If there is some probability of A cheating even if B precommits to punishment, but with odds in B’s favor, the situation where B needs to implement punishment is quite possible (expected). Likewise, if B precommiting to punish A is predicted to lead to an even worse outcome than not punishing (because of punishment expenses), UDT B won’t punish A. Futhermore, a probability of cheating and not-punishment of cheating (mixed strategies, possibly on logical uncertainty to defy the laws of the game if pure strategies are required) is a mechanism through which the players can (consensually) bargain with each other in the resulting parallel game, an issue Wei Dai mentioned in the other reply. B doesn’t need absolute certainty at any stage, in both cases.
Also, in UDT there are no logical certainties, as it doesn’t update on logical conclusions as well.
Sure, but that’s the convenient setup. What if for A to cheat means that you necessarily just mistaken about which algorithm A runs?
UDT will be logically certain about some things but not others. If UDT B “doesn’t update” on its computation about what A will do in response to B, it’s going to be in trouble.
A decision algorithm should never be mistaken, only uncertain.
“Doesn’t update” doesn’t mean that it doesn’t use the info (but you know that, so what do you mean?). A logical conclusion can be a parameter in a strategy, without making the algorithm unable to reason about what it would be like if the conclusion was different, that is basically about uncertainty of same algorithm in other states of knowledge.
Am I correct in assuming that if A cheats and is punished, A suffers a net loss?
Yes.
What is the remaining Problem that you’re referring to? Why can’t we apply the formalism of UDT1 to the various examples people seem to be puzzled about and just get the answers out? Or is cousin_it right about the focus having shifted to how human beings ought to reason about these problems?
The anthropic problem was a remaining problem for TDT, although not UDT.
UDT has its own problems, possibly. For example, in the Counterfactual Mugging, it seems that you want to be counterfactually mugged whenever Omega has a well-calibrated distribution and has a systematic policy of offering high-payoff CMs according to that distribution, even if your own prior has a different distribution. In other words, the key to the CM isn’t your own distribution, it’s Omega’s. And it’s not possible to interpret UDT as epistemic advice, which leaves anthropic questions open. So I haven’t yet shifted to UDT outright.
(The reason I did not answer your question earlier was that it seemed to require a response at greater length than the above.)
Hi, this is the 2-week reminder that you haven’t posted your longer response yet. :)
Well, you’re right in the sense that I can’t understand the example you gave. (I waited a couple of days to see if it would become clear, but it didn’t) But the rest of the response is helpful.
Did he ever get around to explaining this in more detail? I don’t remember reading a reply to this, but I think I’ve just figured out the idea: Suppose you get word that Omega is coming to the neighbourhood and going to offer counterfactual muggings. What sort of algorithm do you want to self-modify into? You don’t know what CMs Omega is going to offer; all you know is that it will offer odds according to its well-calibrated prior. Thus, it has higher expected utility to be a CM-accepter than a CM-rejecter, and even a CDT agent would want to self-modify.
I don’t think that’s a problem for UDT, though. What UDT will compute when asked to pay is the expected utility under its prior of paying up when Omega asks it to; thus, the condition for UDT to pay up is NOT
but
In other words, UDT takes the quality of Omega’s predictions into account and acts as if updating on them (the same way you would update if Omega told you who it expects to win the next election, at 98% probability).
CDT agents, as usual, will actually want to self-modify into a UDT agent whose prior equals the CDT agent’s posterior [ETA: wait, sorry, no, they won’t act as if they can acausally control other instances of the same program, but they will self-modify so as to make future instances of themselves (which obviously they control causally) act in a way that maximizes EU according to the agent’s present posterior, and that’s what we need here], and will use the second formula above accordingly—they don’t want to be a general CM-rejecter, but they think that they can do even better than being a general CM-accepter if they refuse to pay up if at the time of self-modification they assigned low probability to tails, even conditional on Omega offering them a CM.
He never explained further, and actually I still don’t quite understand the example even given your explanation. Maybe you can reply directly to Eliezer’s comment so he can see it in his inbox, and let us know if he still thinks it’s a problem for UDT?
I’d look for it as logical theory of concurrency and interaction: “uncertainty” fuzzifies the question.
Why? For me, how different agents are logically correlated to each other seems to be the same type of question as “what probability (if any) should I assign to P!=NP?” Wouldn’t the answer fall out of a general theory of logical uncertainty? (ETA: Or at least be illuminated by such a theory?)
Logic is already in some sense about uncertainty (e.g. you could interpret predicates as states of knowledge). When you add one more “uncertainty” of some breed, it leads to perversion of logic, usually of applied character and barren meaning.
The concept of “probability” is suspect, I don’t expect it to have foundational significance.
So what would you call a field that deals with how one ought to make bets involving P!=NP (i.e., mathematical statements that we can’t prove to be true or false), if not “logical uncertainty”? Just “logic”? Wouldn’t that cause confusion in others, since today it’s usually understood that such questions are outside the realm of logic?
I don’t understand how to make such bets, except in a way it’s one of the kinds of human decision-making that can be explicated in terms of priors and utilities. The logic of this problem is in the process that works with the statement, which is in the domain of proof theory.