4. You are about to observe one of [$1, $100] in a transparent box, but your action set doesn’t include any self-modifications or precomittments.
5. You are about to observe one of [$1, $100] in a transparent box, but you don’t know about it and will know about the rules of this game only when you will already see the box.
Probably false—to show this you need to describe how to implement FDT on an actual computer without self modification (which I define broadly enough to specifically include any likely plausible implementations) or precomittments.
To the extent FDT wins there it only does so at the expense of losing in more likely scenarios with alternate rules or no rules at all. I already predicted this response and you are not responding to my predicted-in-advance counter: that FDT loses in scenario 1 for example (which is exactly the same as your scenario 5, but we start the scenario and thus measure performance only after the observation of [$1] in the transparent box, so any gains in alternate universes are ignored in our calculation of utility)
EDT-agent in (5) goes in (1) with 99% probability and in scenario (2) with 1% probability. It wins 99%*$1+1%*$100=$1.99 in expectation.
FDT-agent in (5) goes in (1) with 1% probability and in scenario (2) with 99% probability. It wins 1%*$0+99%*$100=$99 in expectation.
IMO, to say that FDT-agent loses in (1) and therefore it is inferior to EDT-agent is like say that it’s better to choose to roll a die with win on 6 then to roll a die with win on 1-5 because this option is better in the case where a die rolls 6.
In what exact set of alternate rules EDT-agent wins more in expectation?
In what exact set of alternate rules EDT-agent wins more in expectation?
Should be obvious from your 5 example. In 5 at the moment of decision (which really is a preaction) the agent doesn’t know about the scenario yet. There are an infinite set of such scenarios with many different rules—including the obvious vastly more likely set of environments where there is no predictor, the predictor is imperfect, the rules are reversed, “FDT agents lose”, etc.
FDT-agent obviously never decide “I will never ever take $1 from the box”. It decides “I will not take $1 in the box if the rules of the situation I’m in are like <rules of this game>”.
Only it’s more general, something like “When I realise that it would be better if I made some precommitment earlier, I act like I would act if I actually made it” (not sure that this phrasing is fully correct in all cases).
Which means it obviously loses in my earlier situation 1. It is optimal to make binding commitments earlier only because we are defining optimality based on measuring across both [$1,$100] universes. But in situation 1 we are measuring utility/optimality only in the [$1] universe—as that is now all that exists—and thus the optimal action (which optimal EDT takes) is to take the $1.
In 1 it is obviously suboptimal to retroactively bind yourself to a hypothetical precomittment you didn’t actually make.
Well, yes, it loses in (1), but it’s fine, because it wins in (4) and (5) and is on par with EDT-agent in (3). (1) is not the full situation in this game, it’s always a consequence of (3), (4) or (5), depending on interpretation, the rules don’t make sense otherwise.
PS. If FDT-agent is suddenly teleported into situation (1) in place of some other agent by some powerful entity who can deceive the predictor and the predictor predicted the behaviour of this other agent who was in the game before, and FDT-agent knows all this, it obviously takes $1, why not?
As for 4 - even just remembering anything is a self modification of memory.
(1) is not the full situation in this game, it’s always a consequence
From your problem description earlier you said:
If they[Omega] predicted that the agent would leave $1, they put in $100 with 99% probability, otherwise they put in $1.
So some agents do find themselves in 1.), and it’s obviously optimal to take the $1 if you can. FDT is in some sense giving up utility here by using a form of retroactive precomittment, hopefully in exchange for utility on other branches. The earlier decision to precommit (whether actually made or later simulated/hallucinated) sacrifices utility of some future selves in exchange for greater utility to other future selves.
You are about to observe one of [$1, $100] in a transparent box, but you don’t know about it and will know about the rules of this game only when you will already see the box.
So the sequence of events from the agent’s perspective is
A. observe one of [$1,$100] in transparent box (without any context or rules)
B. receive the info about Omega’s predictions
C. decide to take or leave
At the moment A and later the agent has already observed $1 or $100. In universes where they observe $1 at A, then optimal decision at C is to take. In universes where they observe $100 at A, the optimal decision at C is to take.
The FDT move is obviously optimal for 5 only if we measure utility at a point in time before A, when the agent doesn’t know anything about this environment yet (and so could plausibly be in any of an infinite set of alternatives) and we measure only over the subset of universes conditioned on our secret knowledge of the problem setup.
In principle it seems wrong to measure utility at the moment in time right before A on the basis of our knowledge; seems we should only measure it based on the agent’s knowledge. This means we need to sum our expectation over all possibly universes consistent with those facts. The set of universes that proceed to B/C is infinitesimal and probably counter balanced by opposites—so the very claim itself that FDT is optimal for 5 is perhaps a form of pascal’s mugging.
We can also construct more specific variants of 5 where FDT loses—such as environments where the message at step B is from an anti-Omega which punishes FDT like agents.
FDT uses a sort of universal precommitment: from my understanding it’s something like always honor precommitments your past self would have made (if your past self had your current knowledge). Really evaluating whether adopting that universal precommitment pays off seems rather complex. But naturally a powerful EDT agent will simply adopt that universal precommitment if when it believes it is in a universe distribution where doing so is optimal! But that does not imply adopting that precommitment is always everywhere optimal.
We can also construct more specific variants of 5 where FDT loses—such as environments where the message at step B is from an anti-Omega which punishes FDT like agents.
Sudden thought half a year later:
But what if we restrict reasoning to non-embedded agents? So Omegas of all kind have access to a perfect Oracle who can predict what you will do, but can’t actually read yout thoughts and know that you will do it because you use FDT. I doubt that it is possible in this case to construct a similar anti-FDT situation.
As for 4 - even just remembering anything is a self modification of memory.
That’s for humans, not abstract agents? Don’t think it matters, we talk about other self-modifications anyway.
From your problem description
Not mine :)
utility on other branches
Maybe this interpretation is what repels you? Here’s another 2:
You choose to behave like EDT-agent or like FDT-agent in the situations where it matters in advance, before you got into (1) or (3). And you can’t legibly for the predictors like one in this game decide to behave like FDT agent, and then, in the future, when you got into (1) because you’re unlucky, just change your mind. It’s just not an option. And between options “legibly choose to behave like EDT-agent” and “legibly choose to behave like FDT-agent” the second one is clearly better in expectation. You just not make another choice in (1) or (2), it’s already decided.
If you find yourself in (1) or (2) you can’t differentiate between cases “I am real me” and “I am the model of myself inside predictor” (because if you could, you could behave differently in this two cases and it would be bad model and bad predictor). So you decide for both at once. (this interpretation doesn’t work well for afents with explicitly self-indicated values (or how it is called? I hope it’s clear what I mean.))
The earlier decision to precommit (whether actually made or later simulated/hallucinated) sacrifices utility of some future selves in exchange for greater utility to other future selves.
Yes. It’s like choose to win on a 1-5 on a die roll rather then win on a 6. You sacrifice utility if some future selves (in the worlds, when die roll 6) in exchange for greater utility to other future selves, and it’s perfectly rational.
We can also construct more specific variants of 5 where FDT loses—such as environments where the message at step B is from an anti-Omega which punishes FDT like agents.
Ok, yes. You can do it with all other types of agents too.
But naturally a powerful EDT agent will simply adopt that universal precommitment if when it believes it is in a universe distribution where doing so is optimal!
I think the ability to legibly adopt such precommitment and willingness to do so kinda turns EDT-agent into FDT-agent.
I think the ability to legibly adopt such precommitment and willingness to do so kinda turns EDT-agent into FDT-agent.
Yes. I think we are mostly in agreement then. FDT seems to be defined by adopting a form of universal precomitment, which you can only do once and can’t really undo. Seems that EDT can clearly do that (to the extent any agent can adopt FDT), so EDT can always EDT->FDT, but FDT->EDT is not allowed (or it breaks the universal pre-commitment or cooperation across instances) . That does not resolve the question of whether or not adopting FDT is optimal.
My main point from earlier is this:
In principle it seems wrong to measure utility at the moment in time right before A on the basis of our knowledge; seems we should only measure it based on the agent’s knowledge. This means we need to sum our expectation over all possibly universes consistent with those facts. The set of universes that proceed to B/C is infinitesimal and probably counter balanced by opposites—so the very claim itself that FDT is optimal for 5 is perhaps a form of pascal’s mugging.
The agent in scenario 5 before observing the box and the rules is a superposition of all agents in similar scenarios, and it is only correct for us to judge their performance across that entire set—ie according to the agent’s knowledge, not our knowledge. So it’s optimal to take the FDT precomittment in this specific scenario only if it’s optimal to do so over all similar environments, which in this case is nearly all environments as the agent hasn’t observed anything at all at the start of your scenario 5!
So I think this reduces down to the conclusion that FDT and its universal precomittment can’t provide any specific advantage on a specific problem over regular problem-specific precomittments EDT can make, unless it provides a net advantage everywhere across the multiverse, in which case EDT uses that and becomes FDT.
FDT outperforms EDT on
4. You are about to observe one of [$1, $100] in a transparent box, but your action set doesn’t include any self-modifications or precomittments.
5. You are about to observe one of [$1, $100] in a transparent box, but you don’t know about it and will know about the rules of this game only when you will already see the box.
Probably false—to show this you need to describe how to implement FDT on an actual computer without self modification (which I define broadly enough to specifically include any likely plausible implementations) or precomittments.
To the extent FDT wins there it only does so at the expense of losing in more likely scenarios with alternate rules or no rules at all. I already predicted this response and you are not responding to my predicted-in-advance counter: that FDT loses in scenario 1 for example (which is exactly the same as your scenario 5, but we start the scenario and thus measure performance only after the observation of [$1] in the transparent box, so any gains in alternate universes are ignored in our calculation of utility)
EDT-agent in (5) goes in (1) with 99% probability and in scenario (2) with 1% probability. It wins 99%*$1+1%*$100=$1.99 in expectation.
FDT-agent in (5) goes in (1) with 1% probability and in scenario (2) with 99% probability. It wins 1%*$0+99%*$100=$99 in expectation.
IMO, to say that FDT-agent loses in (1) and therefore it is inferior to EDT-agent is like say that it’s better to choose to roll a die with win on 6 then to roll a die with win on 1-5 because this option is better in the case where a die rolls 6.
In what exact set of alternate rules EDT-agent wins more in expectation?
Should be obvious from your 5 example. In 5 at the moment of decision (which really is a preaction) the agent doesn’t know about the scenario yet. There are an infinite set of such scenarios with many different rules—including the obvious vastly more likely set of environments where there is no predictor, the predictor is imperfect, the rules are reversed, “FDT agents lose”, etc.
FDT-agent obviously never decide “I will never ever take $1 from the box”. It decides “I will not take $1 in the box if the rules of the situation I’m in are like <rules of this game>”.
Only it’s more general, something like “When I realise that it would be better if I made some precommitment earlier, I act like I would act if I actually made it” (not sure that this phrasing is fully correct in all cases).
Which means it obviously loses in my earlier situation 1. It is optimal to make binding commitments earlier only because we are defining optimality based on measuring across both [$1,$100] universes. But in situation 1 we are measuring utility/optimality only in the [$1] universe—as that is now all that exists—and thus the optimal action (which optimal EDT takes) is to take the $1.
In 1 it is obviously suboptimal to retroactively bind yourself to a hypothetical precomittment you didn’t actually make.
Well, yes, it loses in (1), but it’s fine, because it wins in (4) and (5) and is on par with EDT-agent in (3). (1) is not the full situation in this game, it’s always a consequence of (3), (4) or (5), depending on interpretation, the rules don’t make sense otherwise.
PS. If FDT-agent is suddenly teleported into situation (1) in place of some other agent by some powerful entity who can deceive the predictor and the predictor predicted the behaviour of this other agent who was in the game before, and FDT-agent knows all this, it obviously takes $1, why not?
As for 4 - even just remembering anything is a self modification of memory.
From your problem description earlier you said:
So some agents do find themselves in 1.), and it’s obviously optimal to take the $1 if you can. FDT is in some sense giving up utility here by using a form of retroactive precomittment, hopefully in exchange for utility on other branches. The earlier decision to precommit (whether actually made or later simulated/hallucinated) sacrifices utility of some future selves in exchange for greater utility to other future selves.
So the sequence of events from the agent’s perspective is
A. observe one of [$1,$100] in transparent box (without any context or rules)
B. receive the info about Omega’s predictions
C. decide to take or leave
At the moment A and later the agent has already observed $1 or $100. In universes where they observe $1 at A, then optimal decision at C is to take. In universes where they observe $100 at A, the optimal decision at C is to take.
The FDT move is obviously optimal for 5 only if we measure utility at a point in time before A, when the agent doesn’t know anything about this environment yet (and so could plausibly be in any of an infinite set of alternatives) and we measure only over the subset of universes conditioned on our secret knowledge of the problem setup.
In principle it seems wrong to measure utility at the moment in time right before A on the basis of our knowledge; seems we should only measure it based on the agent’s knowledge. This means we need to sum our expectation over all possibly universes consistent with those facts. The set of universes that proceed to B/C is infinitesimal and probably counter balanced by opposites—so the very claim itself that FDT is optimal for 5 is perhaps a form of pascal’s mugging.
We can also construct more specific variants of 5 where FDT loses—such as environments where the message at step B is from an anti-Omega which punishes FDT like agents.
FDT uses a sort of universal precommitment: from my understanding it’s something like always honor precommitments your past self would have made (if your past self had your current knowledge). Really evaluating whether adopting that universal precommitment pays off seems rather complex. But naturally a powerful EDT agent will simply adopt that universal precommitment if when it believes it is in a universe distribution where doing so is optimal! But that does not imply adopting that precommitment is always everywhere optimal.
Sudden thought half a year later:
But what if we restrict reasoning to non-embedded agents? So Omegas of all kind have access to a perfect Oracle who can predict what you will do, but can’t actually read yout thoughts and know that you will do it because you use FDT. I doubt that it is possible in this case to construct a similar anti-FDT situation.
That’s for humans, not abstract agents? Don’t think it matters, we talk about other self-modifications anyway.
Not mine :)
Maybe this interpretation is what repels you? Here’s another 2:
You choose to behave like EDT-agent or like FDT-agent in the situations where it matters in advance, before you got into (1) or (3). And you can’t legibly for the predictors like one in this game decide to behave like FDT agent, and then, in the future, when you got into (1) because you’re unlucky, just change your mind. It’s just not an option. And between options “legibly choose to behave like EDT-agent” and “legibly choose to behave like FDT-agent” the second one is clearly better in expectation. You just not make another choice in (1) or (2), it’s already decided.
If you find yourself in (1) or (2) you can’t differentiate between cases “I am real me” and “I am the model of myself inside predictor” (because if you could, you could behave differently in this two cases and it would be bad model and bad predictor). So you decide for both at once. (this interpretation doesn’t work well for afents with explicitly self-indicated values (or how it is called? I hope it’s clear what I mean.))
Yes. It’s like choose to win on a 1-5 on a die roll rather then win on a 6. You sacrifice utility if some future selves (in the worlds, when die roll 6) in exchange for greater utility to other future selves, and it’s perfectly rational.
Ok, yes. You can do it with all other types of agents too.
I think the ability to legibly adopt such precommitment and willingness to do so kinda turns EDT-agent into FDT-agent.
Yes. I think we are mostly in agreement then. FDT seems to be defined by adopting a form of universal precomitment, which you can only do once and can’t really undo. Seems that EDT can clearly do that (to the extent any agent can adopt FDT), so EDT can always EDT->FDT, but FDT->EDT is not allowed (or it breaks the universal pre-commitment or cooperation across instances) . That does not resolve the question of whether or not adopting FDT is optimal.
My main point from earlier is this:
The agent in scenario 5 before observing the box and the rules is a superposition of all agents in similar scenarios, and it is only correct for us to judge their performance across that entire set—ie according to the agent’s knowledge, not our knowledge. So it’s optimal to take the FDT precomittment in this specific scenario only if it’s optimal to do so over all similar environments, which in this case is nearly all environments as the agent hasn’t observed anything at all at the start of your scenario 5!
So I think this reduces down to the conclusion that FDT and its universal precomittment can’t provide any specific advantage on a specific problem over regular problem-specific precomittments EDT can make, unless it provides a net advantage everywhere across the multiverse, in which case EDT uses that and becomes FDT.