Okay, now I understand your post a bit better. You’re right that UDT doesn’t say whose welfare you should maximize. You have to tell it who you care about. But I’m not sure why that’s a problem for imperfect Parfit’s Hitchhiker. Maximizing the average welfare of all hitchhikers leads to the usual UDT answer. Same for the Catastrophe Button scenario, if persons 1, 2 and 3 all exist anyway, the simple-minded UDT answer doesn’t depend on the presence of the button. The problems only begin when you make the reference class depend on observations, so don’t do that :-)
“Maximizing the average welfare of all hitchhikers leads to the usual UDT answer”—sure, but I’m trying to figure out if that’s what we should be maximising or whether we should be maximising the average welfare of those who actually arrive in town.
“The simple-minded UDT answer doesn’t depend on the presence of the button”—You seem to be assuming that if an agent is included in a problem, that they have to be included in the utility calculation. But why couldn’t we make one of the agents an ape and limit the utility function to only optimise over humans? Actually, I’m confused now. You said, “UDT doesn’t say whose welfare you should maximize”, then you seemed to imply that we should maximise over all hitchhikers. What are you using to justify that?
“The problems only begin when you make the reference class depend on observations, so don’t do that :-)”—Sure, but how do we avoid this? If you know what agent you are, you can just optimise for yourself. But otherwise, the set of people who could be you is dependent on those who make the same observations.
Actually, I’m confused now. You said, “UDT doesn’t say whose welfare you should maximize”, then you seemed to imply that we should maximize over all hitchhikers. What are you using to justify that?
Reflective consistency. If you think you might face Parfit’s Hitchhiker in the future (perfect or imperfect, no matter) and you can self-modify, you’ll self-modify to use the UDT solution. Studying reflectively inconsistent theories seems less useful to me, because AIs will be able to modify their code.
But why couldn’t we make one of the agents an ape and limit the utility function to only optimize over humans?
I’m only confident about the Catastrophe Button scenario if persons 1, 2 and 3 are clones. If they are only semi-clones, I agree with you that the justification for UDT is weaker.
But otherwise, the set of people who could be you is dependent on those who make the same observations.
Usually for toy UDT problems we assume that observations made within the toy problem don’t “split” your caring. Another question is how to apply that to real life, where decision problems don’t have sharp boundaries. Most likely you should unwind your beliefs to some fixed past moment and “freeze” them into your decision-making. This is the same issue as priors in Bayesian reasoning, how far do you go back to determine your prior? Is your belief that you’re a human a prior or posterior belief? These are interesting questions but I think they are separate from the theory in itself.
And how does this reflective consistency work (I’m not so sure this step can be abstracted out)? Do you average over all agents who were in an identical situation in the past? Or over all agents who were an equivalent situation (ie. one wakes up in a red room, another wakes up in an identical room except it is green, ect.)? Or over all agents even if they never had an experience remotely the same? And then there’s the issue of Pascal’s Mugging style problems that begin before the agent ever existed.
I’m only confident about the Catastrophe Button scenario if persons 1, 2 and 3 are clones. If they are only semi-clones, I agree with you that the justification for UDT is weaker.
Doesn’t the fact that you’re uncertain about the answer seem to imply that UDT is under-defined?
Usually for toy UDT problems we assume that observations made within the toy problem don’t “split” your caring
Even if we’ve decided that observations don’t split caring, it isn’t clear which agents to include in the first place. And without knowing this, we can’t apply UDT.
If two agents have utility functions that are perfectly cooperative with each other (not selfish), they are in the same reference class.
If two agents have a common ancestor, they are in the same reference class.
Take the transitive closure of (1) and (2).
For example, in Parfit’s Hitchhiker you’re in the same reference class as your ancestor before facing the problem, and in Catastrophe Button with perfect clones, all clones are in the same reference class.
Also the reference class doesn’t tell you how to average utilities, it just tells you whose decisions should be linked. There are many possible choices of averaging, for example you could have Alice and Bob whose decisions are linked but who both care about Alice’s welfare 10x more than Bob’s. That kind of thing should also be an input to UDT, not something dictated by it.
Interesting. Are there any posts or papers where I can read more about these issues, as I suspect that this is an area that is under-explored and hence probably worthwhile looking into?
2. If two agents have a common ancestor, they are in the same reference class.
This would seem to imply that in an infinite universe everyone will be in the same reference class under certain assumptions. That is, given an agent that experiences state A1 then A2 and another that experiences B1 then B2, there is almost certainly going to be an agent that experiences both state A1 then B1 then some other state C1.
I’m curious. Would you bite this bullet or do you think there’s an issue with how you formulated this?
Also the reference class doesn’t tell you how to average utilities, it just tells you whose decisions should be linked
When I said that the reference class tells you how to average utilities, I meant that it tells you how to average utilities for the purpose of accounting for uncertainty related to your identity.
Further, I think that it is reasonable to begin initial investigations by assuming perfect selfishness. Especially since it ought to be possible to transform any situation where utility functions include other people’s welfare to one where you only care about your own welfare. For example, if Alice experiences a welfare of 1 and Bob experiences a welfare of 2 and both care about Alice tens times as much as Bob, we can reformulate this as Alice experiencing a welfare of 12 and Bob experiencing a welfare of 12 and both parties being perfectly selfish.
This would seem to imply that in an infinite universe everyone will be in the same reference class under certain assumptions.
How so? You and me weren’t cloned from the same person, and that fact doesn’t change even if the universe is infinite.
When I said that the reference class tells you how to average utilities, I meant that it tells you how to average utilities for the purpose of accounting for uncertainty related to your identity.
It can’t tell you that. Imagine that some event E happens with probability 1⁄4 and leads to 8 extra clones of you getting created. If E doesn’t happen, no clones are created. Both the original and all clones wake up in identical rooms and don’t know if E happened. Then each participant is asked if they’d prefer to get a dollar if E happened, or get a dollar if E didn’t happen. The reference class includes all participants, but the decision depends on how their utilities are aggregated: SIA says you should choose the former, SSA says you should choose the latter. That’s why measures of caring, like SIA or SSA, aren’t the same thing as reference classes and must be given to UDT separately.
Use → to mean “then”. My point was that if person A experiences A1 → A2, then A is in same reference class as the agent C who experiences A1 → B1 → C1 according to your common ancestor rule. Similarly, agent B who experiences B1->B2 is in the same reference class as C for the same reason. By transitivity A and B end up in the same reference class despite lacking a common “ancestor”. The infinite universe is what makes it overwhelmingly likely that such an agent C exists.
Perhaps I wasn’t clear enough in defining what is going on. Each state (A1, A2, B1...) defines not only the current stimuli that an agent is experiencing, but also includes their memory of the past as well (agents can forget things). The point is that these states should be indistinguishable to the agents experiencing them. Is that clear?
The reference class includes all participants, but the decision depends on how their utilities are aggregated: SIA says you should choose the former, SSA says you should choose the latter
I don’t think that’s correctly. SIA and SSA should give the same answer for all bets (see If a Tree falls on Sleeping Beauty). SIA adjust for the number of agents who win in the probability, while SSA instead adjusts for this in the decision theory.
It can’t tell you that… The reference class includes all participants, but the decision depends on how their utilities are aggregated
Edit: Actually, you are right in that the reference class includes all participants. Before I thought that SIA and SSA would have different reference classes.
I’m only trying to describe how UDT deals with certain bounded problems where we can tell which agents have which ancestors etc. Not trying to solve an infinite universe where all possible problems happen.
That seems limiting. We don’t need to believe in an infinite universe, just believe in a non-zero, non-infinitesimal chance of this. Or going even broader, Agent C doesn’t have to exist, but only needs a non-zero, non-infinitesimal chance in our prior. Why doesn’t this affect your use case?
Oh, I had a typo in my last comment. I wrote: “non-zero, infinitesimal chance” instead of “non-zero, non-infinitesimal chance”.
I wasn’t claiming that it goes wrong for a bounded problem (although it might). This is hard for me to answer as I don’t know what you mean by “bounded problem”. I’ll admit that this isn’t an issue for typical example problems as you are just told a scenario.
But when you are just given a bunch of observations as per real life and have to derive the situation yourself, you will have small amounts of probability on all kinds on weird and wacky theories. And the transitive closure means that an absolutely tiny probability of one scenario can dramatically change the reference class.
Not sure I understand. When simplifying a real life situation into UDT terms, why would you go from “Alice and Bob have a tiny chance of having a common ancestor” to “Alice and Bob are definitely part of the same reference class”?
Because UDT cares about possibilities that don’t occur, no matter how small the odds. Ie. Counterfactual Mugging with a 1 in a million chance of the coin coming up heads is still included in the possibility tree. Similarly, A and B are part of the same reference class if C is included in the possibility tree, no matter how small the odds. Not sure if that is clear though.
When I said “If two agents have a common ancestor, they are in the same reference class” I meant that they must have a common ancestor for certain, not just with nonzero probability. That’s also discontinuous (what if two agents have 99% probability of having a common ancestor?) but it works fine on many problems that are simplified from the real world, because simplification often involves certainty.
Well, I suspect that the discontinuity will lead to strange results in practice, such as when you are uncertain of the past. For example, in Counterfactual Mugging, if there is any chance that you are a Boltzmann Brain who was created knowing/believing the coin was tails then you won’t have a common ancestor with brains that actually experienced the problem and saw tails, so you shouldn’t pay.But perhaps you don’t actually have to have an experience, but only believe that you had such an experience in the past?
All theories have limits of applicability. For example, Von Neumann-Morgenstern expected utility maximization requires the axiom of independence, which means you can’t be absent-minded (forgetting something and ending up in a previous mental state, like in the Absent-Minded Driver problem). If there’s even a tiny chance that you’re absent-minded, the problem can no longer be cast in VNM terms. That’s where UDT comes in, it can deal with absent-mindedness and many other things. But if there’s even a tiny chance of having more than one reference class, the problem can no longer be cast in UDT terms either. With multiple reference classes you need game theory, not decision theory.
I suppose the difference is that VNM states the limits within it operates, while I haven’t seen the limits of UDT described anywhere apart from this conversation.
Sorry, I think there was an issue with the mobile editor. Fixed now. I changed from bold to underlined because bold infinities don’t really show up.
Okay, now I understand your post a bit better. You’re right that UDT doesn’t say whose welfare you should maximize. You have to tell it who you care about. But I’m not sure why that’s a problem for imperfect Parfit’s Hitchhiker. Maximizing the average welfare of all hitchhikers leads to the usual UDT answer. Same for the Catastrophe Button scenario, if persons 1, 2 and 3 all exist anyway, the simple-minded UDT answer doesn’t depend on the presence of the button. The problems only begin when you make the reference class depend on observations, so don’t do that :-)
“Maximizing the average welfare of all hitchhikers leads to the usual UDT answer”—sure, but I’m trying to figure out if that’s what we should be maximising or whether we should be maximising the average welfare of those who actually arrive in town.
“The simple-minded UDT answer doesn’t depend on the presence of the button”—You seem to be assuming that if an agent is included in a problem, that they have to be included in the utility calculation. But why couldn’t we make one of the agents an ape and limit the utility function to only optimise over humans? Actually, I’m confused now. You said, “UDT doesn’t say whose welfare you should maximize”, then you seemed to imply that we should maximise over all hitchhikers. What are you using to justify that?
“The problems only begin when you make the reference class depend on observations, so don’t do that :-)”—Sure, but how do we avoid this? If you know what agent you are, you can just optimise for yourself. But otherwise, the set of people who could be you is dependent on those who make the same observations.
Reflective consistency. If you think you might face Parfit’s Hitchhiker in the future (perfect or imperfect, no matter) and you can self-modify, you’ll self-modify to use the UDT solution. Studying reflectively inconsistent theories seems less useful to me, because AIs will be able to modify their code.
I’m only confident about the Catastrophe Button scenario if persons 1, 2 and 3 are clones. If they are only semi-clones, I agree with you that the justification for UDT is weaker.
Usually for toy UDT problems we assume that observations made within the toy problem don’t “split” your caring. Another question is how to apply that to real life, where decision problems don’t have sharp boundaries. Most likely you should unwind your beliefs to some fixed past moment and “freeze” them into your decision-making. This is the same issue as priors in Bayesian reasoning, how far do you go back to determine your prior? Is your belief that you’re a human a prior or posterior belief? These are interesting questions but I think they are separate from the theory in itself.
And how does this reflective consistency work (I’m not so sure this step can be abstracted out)? Do you average over all agents who were in an identical situation in the past? Or over all agents who were an equivalent situation (ie. one wakes up in a red room, another wakes up in an identical room except it is green, ect.)? Or over all agents even if they never had an experience remotely the same? And then there’s the issue of Pascal’s Mugging style problems that begin before the agent ever existed.
Doesn’t the fact that you’re uncertain about the answer seem to imply that UDT is under-defined?
Even if we’ve decided that observations don’t split caring, it isn’t clear which agents to include in the first place. And without knowing this, we can’t apply UDT.
How about this equivalence relation:
If two agents have utility functions that are perfectly cooperative with each other (not selfish), they are in the same reference class.
If two agents have a common ancestor, they are in the same reference class.
Take the transitive closure of (1) and (2).
For example, in Parfit’s Hitchhiker you’re in the same reference class as your ancestor before facing the problem, and in Catastrophe Button with perfect clones, all clones are in the same reference class.
Also the reference class doesn’t tell you how to average utilities, it just tells you whose decisions should be linked. There are many possible choices of averaging, for example you could have Alice and Bob whose decisions are linked but who both care about Alice’s welfare 10x more than Bob’s. That kind of thing should also be an input to UDT, not something dictated by it.
Interesting. Are there any posts or papers where I can read more about these issues, as I suspect that this is an area that is under-explored and hence probably worthwhile looking into?
This would seem to imply that in an infinite universe everyone will be in the same reference class under certain assumptions. That is, given an agent that experiences state A1 then A2 and another that experiences B1 then B2, there is almost certainly going to be an agent that experiences both state A1 then B1 then some other state C1.
I’m curious. Would you bite this bullet or do you think there’s an issue with how you formulated this?
When I said that the reference class tells you how to average utilities, I meant that it tells you how to average utilities for the purpose of accounting for uncertainty related to your identity.
Further, I think that it is reasonable to begin initial investigations by assuming perfect selfishness. Especially since it ought to be possible to transform any situation where utility functions include other people’s welfare to one where you only care about your own welfare. For example, if Alice experiences a welfare of 1 and Bob experiences a welfare of 2 and both care about Alice tens times as much as Bob, we can reformulate this as Alice experiencing a welfare of 12 and Bob experiencing a welfare of 12 and both parties being perfectly selfish.
How so? You and me weren’t cloned from the same person, and that fact doesn’t change even if the universe is infinite.
It can’t tell you that. Imagine that some event E happens with probability 1⁄4 and leads to 8 extra clones of you getting created. If E doesn’t happen, no clones are created. Both the original and all clones wake up in identical rooms and don’t know if E happened. Then each participant is asked if they’d prefer to get a dollar if E happened, or get a dollar if E didn’t happen. The reference class includes all participants, but the decision depends on how their utilities are aggregated: SIA says you should choose the former, SSA says you should choose the latter. That’s why measures of caring, like SIA or SSA, aren’t the same thing as reference classes and must be given to UDT separately.
Use → to mean “then”. My point was that if person A experiences A1 → A2, then A is in same reference class as the agent C who experiences A1 → B1 → C1 according to your common ancestor rule. Similarly, agent B who experiences B1->B2 is in the same reference class as C for the same reason. By transitivity A and B end up in the same reference class despite lacking a common “ancestor”. The infinite universe is what makes it overwhelmingly likely that such an agent C exists.
Perhaps I wasn’t clear enough in defining what is going on. Each state (A1, A2, B1...) defines not only the current stimuli that an agent is experiencing, but also includes their memory of the past as well (agents can forget things). The point is that these states should be indistinguishable to the agents experiencing them. Is that clear?
I don’t think that’s correctly. SIA and SSA should give the same answer for all bets (see If a Tree falls on Sleeping Beauty). SIA adjust for the number of agents who win in the probability, while SSA instead adjusts for this in the decision theory.
Edit: Actually, you are right in that the reference class includes all participants. Before I thought that SIA and SSA would have different reference classes.
I’m only trying to describe how UDT deals with certain bounded problems where we can tell which agents have which ancestors etc. Not trying to solve an infinite universe where all possible problems happen.
That seems limiting. We don’t need to believe in an infinite universe, just believe in a non-zero, non-infinitesimal chance of this. Or going even broader, Agent C doesn’t have to exist, but only needs a non-zero, non-infinitesimal chance in our prior. Why doesn’t this affect your use case?
I don’t know, maybe you’re right. Can you describe a bounded problem where the idea of common ancestors goes wrong?
Oh, I had a typo in my last comment. I wrote: “non-zero, infinitesimal chance” instead of “non-zero, non-infinitesimal chance”.
I wasn’t claiming that it goes wrong for a bounded problem (although it might). This is hard for me to answer as I don’t know what you mean by “bounded problem”. I’ll admit that this isn’t an issue for typical example problems as you are just told a scenario.
But when you are just given a bunch of observations as per real life and have to derive the situation yourself, you will have small amounts of probability on all kinds on weird and wacky theories. And the transitive closure means that an absolutely tiny probability of one scenario can dramatically change the reference class.
Not sure I understand. When simplifying a real life situation into UDT terms, why would you go from “Alice and Bob have a tiny chance of having a common ancestor” to “Alice and Bob are definitely part of the same reference class”?
Because UDT cares about possibilities that don’t occur, no matter how small the odds. Ie. Counterfactual Mugging with a 1 in a million chance of the coin coming up heads is still included in the possibility tree. Similarly, A and B are part of the same reference class if C is included in the possibility tree, no matter how small the odds. Not sure if that is clear though.
When I said “If two agents have a common ancestor, they are in the same reference class” I meant that they must have a common ancestor for certain, not just with nonzero probability. That’s also discontinuous (what if two agents have 99% probability of having a common ancestor?) but it works fine on many problems that are simplified from the real world, because simplification often involves certainty.
Do you mean that they must have a common ancestor if they exist or that they must have a common ancestor full stop?
Common ancestor full stop sounds more right.
Well, I suspect that the discontinuity will lead to strange results in practice, such as when you are uncertain of the past. For example, in Counterfactual Mugging, if there is any chance that you are a Boltzmann Brain who was created knowing/believing the coin was tails then you won’t have a common ancestor with brains that actually experienced the problem and saw tails, so you shouldn’t pay.But perhaps you don’t actually have to have an experience, but only believe that you had such an experience in the past?
All theories have limits of applicability. For example, Von Neumann-Morgenstern expected utility maximization requires the axiom of independence, which means you can’t be absent-minded (forgetting something and ending up in a previous mental state, like in the Absent-Minded Driver problem). If there’s even a tiny chance that you’re absent-minded, the problem can no longer be cast in VNM terms. That’s where UDT comes in, it can deal with absent-mindedness and many other things. But if there’s even a tiny chance of having more than one reference class, the problem can no longer be cast in UDT terms either. With multiple reference classes you need game theory, not decision theory.
I suppose the difference is that VNM states the limits within it operates, while I haven’t seen the limits of UDT described anywhere apart from this conversation.