The Evil Genie Puzzle
Foolish you. You decided to rub a lamp in the hope that a genie would appear and you found one, only it’s an Evil Genie. He tells you only have two options: either wishing for your perfect life or being pelted with rotten eggs. You start thinking about how wonderful your new life will be, but of course, there’s a catch.
In the past, wishing for a perfect life would have come with side-effects attached, but due to past abuses, the Council of Genies has created new, updated rules which ban any unwanted side-effects for the person who makes the wish or any of their loved one’s. However, there’s no rule against messing with other people. He tells you that he has already predicted your choice and if he predicted that you will wish for a perfect life, he has already created a million clones of you in identical rooms on Mars who will believe that they’ve just rubbed the lamp. However, since they’ve only hallucinated this, he doesn’t have any obligations towards them and he will use his powers to torture them if they will choose the perfect life. On the other hand, if he predicts that you will choose to be pelted with rotten eggs, then he won’t create any clones at all. Genies are well known to be perfect predictors.
Assuming you are a perfectly selfish agent who doesn’t care about their clones, what decision ought you take? If you choose to be pelted, there’s a 100% chance that if you had wished for a perfect life, you would have received it. On the other hand, if you wish for a perfect life, there’s an overwhelming chance that you will wish that you had wished to be pelted. No matter what decision you make, it seems that you will immediately regret having made it (at least before it is revealed whether you are the original in the case where you choose the perfect life).
Extra Info: Some people might argue that perfect clones necessarily are you, but they don’t actually have to be perfect clones. The genie could just clone someone else and give them the memory of finding the lamp and this ought to create the requisite doubt as to whether you are a real human on Earth or a clone on Mars.
Prior version of this punished the clones for existing or if it is predicted that they chose the perfect life, but the clones are now punished if they choose the perfect life.
- Alignment Newsletter #17 by 30 Jul 2018 16:10 UTC; 32 points) (
- The Unexpected Clanging by 18 May 2023 14:47 UTC; 14 points) (
- Agent Meta-Foundations and the Rocket Alignment Problem by 9 Apr 2019 11:33 UTC; 12 points) (
- Decision Theory with F@#!ed-Up Reference Classes by 22 Aug 2018 10:10 UTC; 9 points) (
- 31 Aug 2018 13:33 UTC; 2 points) 's comment on VOI is Only Nonnegative When Information is Uncorrelated With Future Action by (
- 6 Apr 2019 22:45 UTC; 2 points) 's comment on Would solving logical counterfactuals solve anthropics? by (
I decided to move some of my thoughts to the comments to keep the OP short.
Comparison to the Dr Evil Problem
This is very similar to Dr Evil, but there is one Firstly, the action(s) taken by the genie depends what the genie predicts that you or your clones will do, instead of what is actually chosen. Secondly, if the genie predicts that you will choose to be pelted with eggs, your clones will never exist.
I previously argued for the Dr Evil problem that you ought to take the blackmail threat seriously as a regular blackmail threat as the creation of clones lowered the probability (which was never actually 100%) that you are the real Dr Evil. What is confusing here is the your reference class changes according to your decision. If you choose to be pelted with eggs, you know that you are not a clone with 100% probability. If you choose to be granted the perfect life, then you have a 1,000,000⁄1,000,001 chance of being a clone. This is a problem as we need to know who were are optimising for before we can say what the optimal solution is. Compare this to the Dr Evil problem where the chance of being a clone is independent of your choice.
Comparison to the Tropical Paradise Problem
In the Bayesian Conudrum, if you decide not to create any clones you are indifferent about your decision as you know you were the original who was always destined to freeze. On the other hand, immediately after you choose to create clones, you are thankful that you did as it makes it highly likely that you are in fact a clone. Again this is an issue with differing reference classes.
Motivation
We’ve seen that “all observers who experience state X” can be ambiguous unless we’ve already fixed the decision. My motivation is to deal with problems like Parfit’s Hitchhiker. When the predictor is perfect, paying always results in a better outcome so the set of observers whose outcomes are optimised doesn’t matter. However, suppose we roll a 100-sided die and there is one world for each possible result. In the world where the die is 100, then the driver always predicts that you’ll pay, while the other 99% of the time he is a perfect predictor. Who is referred to by, “all observers who experience being in town”? If you pay, then this includes the version of you in all 100 worlds, while if you never pay, this only includes the version of you in world 100. Once we decide what set we should be optimising for, answering the problem is easy (so long as you accept Timeless Decision Theory’s argument that predictors are subjunctively linked to your decision).
Possible Answers
Note: The content below has now been developed into a post here. I would suggest reading the post instead of the rest of the comment.
Suppose an agent A in experiential state S faces a decision D. The three most obvious ways of evaluating these kinds of decisions in my mind are as follows:
1) If C is a choice, calculate the expected utility of C by averaging over all agents who experience S when A chooses C
2) If C and D are choices, compare these pairwise by calculating the average over all agents who experience S when A chooses either C or D (non-existence is treated as a 0).
3) If C is a choice, calculate the expected utility of C by averaging over all agents who experience S given any choice. Again, non-existence is treated as a 0
Unfortunately, there are arguments against each of these possibilities.
Against 1) - No Such Agents
In the imperfect Parfit’s Hitchhiker, 1) defects, while 2) co-operates. I would suggest that 1) is slightly less plausible than 2), since the former runs into difficulties with the perfect Parfit’s Hitchhiker. No agents experience being in town when defect is chosen, but this option is clearly worse. One response to this empty reference class would be to set the expected utility to 0 in this case, but this would result in us dying in a desert. Another suggestion would be that we invalidate any option where there is no agent who experiences S when A chooses C, however, in the Retro Blackmail scenario with a perfect predictor we want to refuse to pay precisely so that we don’t end up with a blackmail letter. So this doesn’t seem to work either.
Against 2) - Magic Potion
Suppose you have the ability to cast one of two spells:
Happiness Spell: +100 utility for you
Oneness Spells: Makes everyone in the world temporarily adopt your decision making algorithms and experience the same experiences that you are feeling in this situation, then provides them +1 utility when it wears off if they choose to cast the Oneness Spell.
2) suggests casting the Oneness Spell, but you only have a reason to choose the Oneness Spell if you think you would choose the Oneness spell. However, the same also holds for the Happiness Spell. Both spells are the best on average for the people who choose those spells, but these are different groups on people.
Against 3) - Irrelevant Considerations
Suppose we defend 3). We can imagine adding in an irrelevant decision Z that expands the reference class to cover all individuals as follows. Firstly, if it is predicted that you will take option Z, everyone’s minds are temporarily overwritten so that they are effectively clones of you facing the problem under discussion. Secondly, Option Z causes everyone who chooses this option lose a large amount of utility so no-one should ever take it. But according to this criteria, it would expand the reference class used even when comparing choice C to choice D. It doesn’t seem that a decision that is not taken should be able to do this.
You can ask UDT to maximize either total or average welfare of people who might be you. Both imply that getting pelted with rotten eggs is the right choice.
UDT+SIA: (1 · being pelted with rotten eggs) vs. (1 · perfect life + 10^6 · torture)
UDT+SSA: (1 · being pelted with rotten eggs) vs. ((1 · perfect life + 10^6 · torture) / (1 + 10^6))
Are there any good posts for learning how UDT works with anthropics?
Anyway, I’m not sure that it is quite that simple. The class of people “who might be you” varies according to the decision. So if we go with that answer, we need to justify why it is reasonable to compare the total or average of group A to group B.
It might be interesting to consider what happens if we apply this to Parfit’s Hitchhiker with a predictor that is perfect 99% of the time and always pick you up 1% of the time.
Always defect vs. always co-operate
UDT+SIA: (1 · being picked up and avoiding paying) vs (100 ·picked up and paying) - co-operate produces a higher total
UDT+SSA: (1 · being picked up and avoiding paying) vs. (100 ·picked up and paying/100) - defect produces a higher total
I haven’t tried applying UDT with anthropics before, so I don’t know if this is correct. But let’s just suppose it is. If UDT+SSA is correct, then it seems like you shouldn’t pay in a imperfect Parfait’s Hitchhiker, which is against the consensus LW view.
On the other hand, caring about the total for people who “might be you” seems weird as well. Suppose button A gives me 101 utility, while button B clones me 99 times and gives me and all of my clones 100 utility. If I 100% believe that I’m the type of person to press button A, I should press button A, while if I 100% believe that I am the type of person to press button B, I should press button B.
What? No!
Here’s how to use UDT with anthropics. Consider all possible observation-action maps you might have. For each such map, imagine all instances of you adopting it, then look at the resulting timeline (or probability distribution over timelines) and find all people whose welfare you care about. Compute their total or average welfare, depending on whether you believe SIA or SSA. Choose the best observation-action map by that criterion.
In Parfit’s Hitchhiker with an imperfect predictor, no matter what observation-action map you have, in each resulting timeline there’s exactly one person you care about. Sometimes he dies in the desert, sometimes he gets saved and pays, sometimes he gets saved without paying. But you always maximize the welfare of that one person, so SIA vs SSA doesn’t matter. Paying leads to 100% chance of getting saved and then paying. Refusing to pay leads to 99% chance of dying, and 1% chance of getting saved without paying. That’s it.
The way that I’m modelling it, there are 100 worlds, that are numbered from 1 to 100, but you don’t know the number. In worlds 1-99 the driver is a perfect predictor and in world 100 the driver always picks you up.
If you always defect and you are in town, you know you are in world 100 and arguably you only care about the hitchhiker in world 100. On the other hand, if you always co-operate and you are in town, you could be in any world, so you should care about all the hitchhikers. So, “find all people whose welfare you care about” isn’t as simple as it first looks. Indeed, if you predict that you defect even just 1⁄99 times, then you should only care about 98 of the 99 hitchhikers who aren’t in the world where you are always picked up.
Why on Earth? I’m updateless, I care about all my apriori possible copies, not only those that get particular observations.
Do you also refuse to pay up in Counterfactual Mugging? That seems far from the intent of UDT. And reflectively inconsistent too, you’re acting differently from how you would’ve precommitted.
Oh ok, so UDT embeds the assumption that you care about all copies? I agree with the first part of the procedure—constructing all observation-action maps and then figuring out what worlds are generated—but I’m trying to figure out whether caring about all copies is justified.
A few days ago I would have agreed with you regarding Counterfactual Mugging, but now I’m not so sure. Firstly, paying in Counterfactual Mugging seems to indicate that you should about how your decisions affects worlds in which you don’t exist, but someone predicts your choice (as per the discussion on l-zombies, but I disagree with the implication that we could be l-zombies).
Secondly, a defector can acknowledge that the predictor is perfect and they would have lost if the coin had come up the other way, but insist that this doesn’t matter as the coin didn’t come up that way. Arguably probability is just a model and the coin either came heads or it came up tails. So why should we care only about this specific way in which the universe could have been different and not about other ways in which the universe could have been different, such as Omega paying if and only if it predicts that you wouldn’t pay?
And you’ve got different information after the coin flip, so why ought you be reflectively consistent?
I’m not necessarily saying that you shouldn’t pay, I’m still trying to figure this out and this is what motivated this post.
Well, this line of research started out as part of FAI. We need to figure out how to encode human values into AI and make it stable. So it doesn’t make much sense to study a decision theory that will immediately self-modify into something else if it anticipates counterfactual muggings in the future. It’s better to study whatever it will self-modify into. Same as with game theory—even if you want to study non-equilibrium behavior, figuring out equilibrium first is a better starting point.
Well that’s not the only reason to look into this problem. What about it being a good practise problem and an opportunity to learn to think more clearly? I mean, you can never know in advance what you’ll discover when you start investigating a problem.
Yeah, agreed. I think if you look deeper into Counterfactual Mugging and when a copy should stop caring about other copies, you’ll eventually arrive at the logical updatelessness problem which is currently unsolved.
Here’s the simplest statement of the problem. Imagine you’re charged with building a successor AI, which will be faced with Counterfactual Mugging with a logical coin. The resulting money will go to you. You know the definition of the coin already (say, the parity of the trillionth digit of pi) but you don’t have enough time to compute it. However, the successor will have enough time to compute the coin. So from your perspective, if your beliefs about the coin are 50:50, you want to build a successor that will pay up if asked—even if the coin’s value will be as obvious to the successor as 2+2=4. So it seems like a strong agent built by a weak agent should inherit its “logical prior”, acting as if some sentences still have nonzero “logical probability” even after they are proven to be false. Nobody knows how such a prior could work, we’ve made several attempts but they pretty much all failed.
Thanks for that explanation. I’ve been reading a lot about various paradoxes recently and I’ve been meaning to get into the issue of logical counterfactuals, but I’m still reading about decision theory more generally, so I haven’t had a chance to dig into it yet. But trying to figure out exactly what is going on in problems like Counterfactual Mugging seems like a reasonable entry point. But I was having trouble understanding what was going on even in the basic Counterfactual Mugging, so I decided to take a step back and try to gain a solid understanding of imperfect predictor Parfit’s Hitchhiker first. But even that was too hard, so I came up with the Evil Genie problem.
Do you now understand UDT’s reasoning in Evil Genie, imperfect Parfit’s Hitchhiker and Counterfactual Mugging? (Without going into the question whether it’s justified.)
Mostly. So you optimise over all agents who experience a particular observation in at least one observation-action map?
Hmm, it seems like this would potential create issues with irrelevant considerations. As I wrote in my other comment:
“We can imagine adding in an irrelevant decision Z that expands the reference class to cover all individuals as follows. Firstly, if it is predicted that you will take option Z, everyone’s minds are temporarily overwritten so that they are effectively clones of you facing the problem under discussion. Secondly, Option Z causes everyone who chooses this option lose a large amount of utility so no-one should ever take it. But according to this criteria, it would expand the reference class used even when comparing choice C to choice D. It doesn’t seem that a decision that is not taken should be able to do this.”
Extra Information:
So if I wanted to spec this out more specifically:
Assume that there are 100 clones and you’ve just observed that you are in a red room. If Omegae predicts that the person in room 1 will choose options A or B, then only room 1 will be red, otherwise all rooms will be red. The utilities depending on the choice of the person in room 1 are as follows:
Option A: Provides the person in room 1 with 100 utility and everyone else with −10 utility
Option B: Provides the person in room 1 with 50 utility and everyone else with 0 (I originally accidentally wrote room 2 in this line).
Option C: Provides everyone in all rooms with −1000 utility
If given only Option A or Option B, then you should choose option A. But once we add in option C, you could be any of the clones in that observation-action map, so it seems like you ought to preference option B over option A.
It seems like there are two ways to read your problem. Are there “knock-on effects”—if the person in room 1 chooses A, does that make everyone else lose 10 utility on top of anything else they might do? Or does each choice affect only the person who made it?
If there are no knock-on effects, UDT says you should choose A if you’re in a red room or B otherwise. If there are knock-on effects, UDT says you should choose B regardless of room color. In both cases it doesn’t matter if C is available. I think you meant the former case, so I’ll explain the analysis for it.
An “observation-action map” is a map from observations to actions. Here’s some examples of observation-action maps:
1) Always choose B
2) Choose A if you’re in a red room, otherwise B
3) Choose C if you’re in a red room, otherwise A
And so on. There are 9 possible maps if C is available, or 4 if C is not available.
For each map, UDT imagines all instances of itself acting according to that map, and calculates the aggregate utility of all people it cares about in the resulting timeline. (Utilities of people can be aggregated by summing or averaging, which corresponds to UDT+SIA vs UDT+SSA, but in this problem they are equivalent because the number of people is fixed at 100, so I’ll just use summing.) By that criterion it chooses the map that leads to highest utility, and acts according to it. Let’s look at the three example maps above and figure out the resulting utilities, assuming no knock-on effects:
1) Everyone chooses B, so the person in room 2 gets 50 utility and everyone else gets 0. Total utility is 50.
2) The person in room 1 chooses A, so Omega paints the remaining rooms non-red, making everyone else choose B. That means one person gets 100 utility, one person gets 50 utility, and everyone else gets 0. Total utility is 150.
3) The person in room 1 chooses C, so Omega paints the remaining rooms red and everyone else chooses C as well. Total utility is −100000.
And so on for all other possible maps. In the end, map 2 leads to highest utility, regardless of whether option C is available.
Does that make sense?
Oh, I just realised that my description was terrible. Firstly, I said option b added 50 utility to the person in room 2, instead of room 1. Secondly, only the decision of the person in room 1 matters and it determines the utility everyone else gains.
Any map where the person in room 1 selects room A produces −890 utility total (100-99*10), while any map where the person in room 1 selects B produces 50 in total.
Yet, suppose we only compared A and B and option C didn’t exist. Then you always know that you are the original as you are in the red room and none of the clones are. The only issue I can see with this reasoning is if UDT insists that you care about all the clones, even when you know that you are the original before you’ve made your decision.
Okay, then UDT recommends selecting B regardless of whether C is available. The decision is “self-sacrificing” from the decider’s point of view, but that’s fine. Here’s a simpler example: we make two copies of you, then one of them is asked to pay ten dollars so the other can get a million. Of course agreeing to pay is the right choice! That’s how you’d precommit before the copies got created. And if FAI is faced with that kind of choice, I’m damn sure I want it to pay up.
So you care about all clones, even if they have different experiences/you could never have been them? I only thought you meant that UDT assumed that you cared about copies who “could be you”.
It seems like we could make them semi-clones who use the same decision making algorithm as you, but who have a completely different set of memories so long as it doesn’t affect their process for making the decision. In fact, we could ensure that they have different memories from the first moment of their existence. Why do you care about these semi-clones equally about yourself?
However, you choose option C, in addition to being put in a red room, their memories are replaced so that they have same memories as you.
But suppose option C doesn’t exist, would UDT still endorse choosing option B?
At this point the theory starts to become agnostic, it can take an arbitrary “measure of caring” and give you the best decision according to that. If you’re a UDT agent before the experiment and you care equally about all future clones no matter how much they are tampered with, you choose B. On the other hand, if you have zero caring for clones who were tampered with, you choose A. The cutoff point depends on how much you care for inexact clones. The presence or absence of C still doesn’t matter. Does that make sense?
Let’s see if I’ve got it. So you aggregate using the total or average on a per decision basis (or in UDT 1.1 per observation-action mapping) meaning that individuals who count for one decision may not count in another even if they still exist?
Sorry, I’ve tried looking at the formalisation of UDT, but it isn’t particularly easy to follow. It just assumes that you have a utility function that maps from the execution histories of a set of programs to a real number. It doesn’t say anything about how this should be calculated.
Hmm, not sure I understand the question. Can you make an example problem where “individuals who count for one decision may not count in another”?
I don’t think it is possible to construct anything simpler, but I can explain in more detail
Suppose you only care about perfect clones. If you select decision C, then Omega has made your semi-clones actual clones, so you should aggregate over all individuals. However, if you select A, they are still only semi-clones so you aggregate over just yourself.
Is that a correct UDT analysis?
Hmm, now the problem seems equivalent to this:
A) Get 100 utility
B) Get 50 utility
C) Create many clones and give each −1000 utility
If you’re indifferent to mere existence of clones otherwise, you should choose A. Seems trivial, no?
Sure. Then the answer to my question: “individuals who count for one decision may not count in another even if they still exist?” is yes. Agreed?
(Specifically, the semi-clones still exist in A and B, they just haven’t had their memories swapped out in such a way such that they would count as clones)
If you agree, then there isn’t an issue. This test case was designed to create an issue for theories that insist that this ought not to occur.
Why is this important? Because it allows us to create no-win scenarios. Suppose we go back to the genie problem where pressing the button creates semi-clones. If you wish to be pelted by eggs, you know that you are the original, so you regret not wishing for the perfect life. The objection you made before doesn’t hold as you don’t care about the semi-clones. But if you wish for the perfect life, you then know that you are overwhelmingly likely to be a semi-clone, so you regret that decision too.
Yeah, now I see what kind of weirdness you’re trying to point out, and it seems to me that you can recreate it without any clones or predictions or even amnesia. Just choose ten selfish people and put them to sleep. Select one at random, wake him up and ask him to choose between two buttons to press. If he presses button 1, give him a mild electric shock, then the experiment ends and everyone wakes up and goes home. But if he presses button 2, give him a candy bar, wake up the rest of the participants in separate rooms and offer the same choice to each, except this time button 1 leads to nothing and button 2 leads to shock. The setup is known in advance to all participants, and let’s assume that getting shocked is as unpleasant as the candy bar is pleasant.
In this problem UDT says you should press button 1. Yeah, you’d feel kinda regretful having to do that, knowing that it makes you the only person to be offered the choice. You could just press button 2, get a nice candy bar instead of a nasty shock, and screw everyone else! But I still feel that UDT is more likely to be right than some other decision theory telling you to press button 2, given what that leads to.
Perhaps it is, but I think it is worth spending some time investigating this and identifying the advantages and disadvantages of different resolutions.
Hmm, I’m not sure that it works. If you press 1, it doesn’t mean that you’re the first person woken. You need them to be something like semi-clones for that. And the memory trick is only for the Irrelevant Considerations argument. That leaves the prediction element which isn’t strictly necessary, but allows the other agents in your reference class (if you choose perfect life) to exist at the same time the original is making its decision, which makes this result even more surprising.
Agree about the semi-clones part. This is similar to the Prisoner’s Dilemma: if you know that everyone else cooperates (presses button 1), you’re better off defecting (pressing button 2). Usually I prefer to talk about problems where everyone has the same preference over outcomes, because in such problems UDT is a Nash equilibrium. Whereas in problems where people have selfish preferences but cooperate due to symmetry, like this problem or the symmetric Prisoner’s Dilemma, UDT still kinda works but stops being a Nash equilibrium. That’s what I was trying to point out in this post.
Simple cheat solution:
“the Council of Genies has created new, updated rules which ban any unwanted side-effects for the person who makes the wish or any of their loved one’s. ”
I would argue that I love everyone, by default, especially people in this kind of cruel situation. Therefore this would count as an unwanted side-effect.
There is a fair bit of fancy but useless fluff in this problem. The predictor is irrelevant, the copies are irrelevant. The problem is actually that you are one of 101 people offered to pick A or B, and for 100 of them A means win, B means lose, while for one of them A means lose and B means win. You pick A because of the expected winnings. That’s it.
The problem was constructed this way for a specific reason: so that no matter what choice you make, you regret it the instant after make it (before it is revealed what the actual outcome is).
Oops, I missed the crucial part about the clones only being created, asked and tortured if you (and hence they) pick the ostensibly winning outcome. In this case I would pick that choice, not the regret one, because as a clone you have no choice in this setup, you are just being a puppet.
I’m starting to wonder if much of the confusion around topics like this are due to undefined and inconsistent/changing intuitions about utility from your knowledge of copies/clones. It seems weird not to care about clones, and even weirder not to care about whether a potential clone is reified.
Also, since perfect prediction removes causality from the equation, you can just let the genie cheat and perform whatever evil he’s going to do after you choose. And finally, “perfect life” implies infinite utility, so I’m ignoring that part and replacing it with “gets a thing they really want”, to avoid mixing up multiple distinct thought problems.
The puzzle seems equivalent to “there are a million and one people somewhat similar to you, in that they think they’ve found a lamp and are being offered this puzzle. One of them gets a perfect life if that’s their choice, and pelted with eggs if that’s their choice. A million of them get tortured if the one chooses a perfect life and simply disappear like they never existed if the one chooses the eggs”
I don’t see any regret in choosing the eggs. I want to be the kind of person who’ll make the sacrifice if it erases rather than tortures a million similar entities. Never existing is only slightly worse than existing briefly and being erased, so the logic holds.
Did you see my discussion of semi-clones above (ctrl-f should find it for you)? Do you believe that you necessarily care about them? You might also want to look at the True Prisoner’s Dilemma to better understand the intuition behind not co-operating.
Yes, and I’m not sure it helps—I care about everyone, at least a little bit. I seem to care about people closer to me more than people distant, but I think I’d agree to be pelted with eggs to prevent a million tortured people from existing. There is some number less than a million where my intuition flips, and I suspect that I’m inconsistent and dutch-book-able in the details (although log utility on number of other-tortures-prevented might save me).
I don’t know a good way to construct the conundrum you’re trying to: where I don’t care about the copies except for the one which is me. I kind of think identity doesn’t work that way—a perfect copy _is_ me, an imperfect copy is imperfectly-me. I am you, if I had your genes and experiences rather than mine.
EDIT: epistemic status for this theory of identity—speculative. There is something wrong with the naive intuition that “me” is necessarily a singular thread through time, and that things like sleeping and significant model updates (and perhaps every update) don’t create any drift in caring for some situations more than others.
Sure that is what you prefer, but is a selfish agent incoherent?
I think so. Caring about a future being (“self”) only in the case where there’s physical continuity (imperfect as it is) and excluding continuity through copying is wrong. The distinction between (imperfect) continuity and (partial) similarity likewise seems broken.
This old post of mine is somewhat confusingly written, but is extremely relevant. Anthropic selfish preferences as an extension of TDT.
Yeah, I’m finding your position regarding the Tropical Paradise sort of confusing. So your argument is that taking your oath is lying to yourself, but making the oath is also good because it increases the chance that you are a clone?
More or less correct. You don’t have to lie to yourself, but the benefit to you is entirely indirect. You’ll never causally help yourself, but you (that is, the person making the decision) are better off as a logical consequence of your decision procedure having a particular output.
Anyhow, applying this same reasoning, you should choose (assuming the evil genie isn’t messing up somewhere) getting pelted with rotten eggs, from the perspective of a completely selfish agent acting in the moment.
One distinction is that immediately after selecting rotten eggs, you regret the decision as you know you are the real version, but immediately after choosing “don’t create” you are indifferent. I’m still not quite confident about how these situations work.
Just like you don’t have to lie to yourself to understand that you’re making a choice that will benefit you logically but not causally, you also don’t have to regret it when you make a choice that is causally bad but indirectly good. You knew exactly what the two options were when you were weighing the choices—what is there to regret? I’d just regret ever rubbing the lamp in the first place.
Maybe another part of the difference between our intuitions is that I don’t think of the clones case as “one real you and a million impostors,” I think of it as “one million and one real yous.”
I discuss semi-clones (above) - if you insist that any individual cares about clones, perhaps you’d be persuaded that they mightn’t care about semi-clones?
“You knew exactly what the two options were when you were weighing the choices”—ah, but it was only after your choice was finalised that you knew whether there was a single individual or clones and that affects the reference class that you’re optimising.
I think you’re mixing up my claim about states of knowledge with a claim about caring, which I am not making. You can care only about yourself and not care about any copies of you, and still have a state of knowledge in which you really accept the possibility that your decision is controlling which person you are more likely to be. This can often lead to the same decisions as if you had precommitted based on caring about all future copies equally, but I’m not talking about that decision procedure.
Yes, this is exactly the same as the cases I discuss in the linked post, which I still basically endorse. You might also think about Bayesian Probabilities Are For Things That Are Space-like Separated From You: there is a difference in how we have to treat knowledge of outside events, and decisions about what action we should take. There is a very important sense in which thinking about when you “know” which action you will take is trying to think about it in the wrong framework.
It’s not exactly a puzzle that game theory doesn’t always give pure solutions. This puzzle should still have a solution in mixed strategies, assuming the genie can’t predict quantum random number generators.
“No matter what decision you make, it seems that you will inevitably regret it”—this property depends on the genie being able to predict you. If you break this property, then you are addressing a different scenario.
“It’s not exactly a puzzle that game theory doesn’t always give pure solutions”—interesting comparison, but I don’t see how it is analogous. Even if you implement a probabilistic strategy, you still regret your decision immediately after you make it. In contrast, with impure solutions, you should still endorse the decision right after having made it.