I also had thoughts along these lines—variants of TDT could logically separate themselves, so that T-0 one-boxes when it is simulated, but T-1 has proven that T-0 will one-box, and hence T-1 two-boxes when T-0 is the sim.
But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren’t linked) then they won’t co-operate with each other in Prisoner’s Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.
The second difficulty is that for each specific TDT variant, one with algorithm T’ say, there will be a specific problematic problem on which T’ will do worse than CDT (and indeed worse than all the other variants of TDT) - this is the problem with T’ being the exact algorithm running in the sim. So we still don’t get the—desirable—property that there is some sensible decision theory called TDT that is optimal across fair problems.
The best suggestion I’ve heard so far is that we try to adjust the definition of “fairness”, so that these problematic problems also count as “unfair”. I’m open to proposals on that one...
But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren’t linked) then they won’t co-operate with each other in Prisoner’s Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.
I think this is avoidable. Let’s say that there are two TDT programs called Alice and Bob, which are exactly identical except that Alice’s source code contains a comment identifying it as Alice, whereas Bob’s source code contains a comment identifying it as Bob. Each of them can read their own source code. Suppose that in problem 1, Omega reveals that the source code it used to run the simulation was Alice. Alice has to one-box. But Bob faces a different situation than Alice does, because he can find a difference between his own source code and the one Omega simulated, whereas Alice could not. So Bob can two-box without effecting what Alice would do.
However, if Alice and Bob play the prisoner’s dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the “Alice” comment replaced with “Bob”, and Bob faces a player identical to itself except with the “Bob” comment replaced with “Alice”. Hopefully, their algorithm would compress this information down to “The other player is identical to me, but has a comment difference in its source code”, at which point each player would be in an identical situation.
You might want to look at my follow-up article which discusses a strategy like this (among others). It’s worth noting that slight variations of the problem remove the opportunity for such “sneaky” strategies.
In a prisoners dilemma Alice and Bob affect each others outcomes. In the newcomb problem, Alice affects Bobs outcome, but Bob doesn’t affect Alices outcome. That’s why it’s OK for Bob to consider himself different in the second case as long as he knows he is definitely not Alice (because otherwise he might actually be in a simulation) but not OK for him to consider himself different in the prisoners dilemma.
However, if Alice and Bob play the prisoner’s dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the “Alice” comment replaced with “Bob”, and Bob faces a player identical to itself except with the “Bob” comment replaced with “Alice”. Hopefully, their algorithm would compress this information down to “The other player is identical to me, but has a comment difference in its source code”, at which point each player would be in an identical situation.
Because if Omega uses Alice’s source code, then Alice sees that the source code of the simulation is exactly the same as hers, whereas Bob sees that there is a comment difference, so the situation is not symmetric.
Because Alice sees that Bob’s source code is the same as hers except for a comment difference, and Bob sees that Alice’s source code is the same as his except for a comment difference, so the situation is symmetric.
Newcomb, Alice: The simulation’s source code and available information is literally exactly the same as Alice’s, so if Alice 2-boxes, the simulation will too. There’s no way around it. So Alice one-boxes.
Newcomb, Bob: The simulation was in the situation described above. Bob thus predicts that it will one-box. Bob himself is in an entirely different situation, since he can see a source code difference, so if he two-boxes, it does not logically imply that the simulation will two-box. So Bob two-boxes and the simulation one-boxes.
Prisoner’s Dilemma: Alice sees Bob’s source code, and summarizes it as “identical to me except for a different comment”. Bob sees Alice’s source code, and summarizes it as “identical to me except for a different comment”. Both Alice and Bob run the same algorithm, and they now have the same input, so they must produce the same result. They figure this out, and cooperate.
Don’t ignore Alice’s perspective. Bob knows what Alice’s perspective is, so since there is a difference in Alice’s perspective, there is by extension a difference in Bob’s perspective.
The source code that Bob is looking at is the same in each case, but the source code that [the source code that Bob is looking at] is looking at is different in the two situations.
NP: Bob is looking at Alice, who is looking at Alice, who is looking at Alice, …
PD: Bob is looking at Alice, who is looking at Bob, who is looking at Alice, …
Clarifying edit: In both cases, Bob concludes that the source code he is looking at is functionally equivalent to his own. But in NP, Bob treats the input to the program he is looking at as different from his input, whereas in PD, Bob treats the input to the program he is looking at as functionally equivalent to his input.
PD: Bob is looking at Alice, who is looking at Bob, who is looking at Alice, …
But you said Bob concludes that their decision theories are functionally identical, and thus it reduces to:
PD: TDT is looking at TDT, who is looking at TDT, who is looking at TDT, …
And yet this does not occur in NP.
EDIT:
The source code that Bob is looking at is the same in each case, but the source code that [the source code that Bob is looking at] is looking at is different in the two situations.
The point is that his judgement of the source code changes, from “some other agent” to “another TDT agent”.
Clarifying edit: In both cases, Bob concludes that the source code he is looking at is functionally equivalent to his own. But in NP, Bob treats the input to the program he is looking at as different from his input, whereas in PD, Bob treats the input to the program he is looking at as functionally equivalent to his input.
One way of describing it is that the comment is extra information that is distinct from the decision agent, and that Bob can make use of this information when making his decision.
What’s the point of adding comments if Bob’s just going to conclude their code is functionally identical anyway? Doesn’t that mean that you might as well use the same code for Bob and Alice, and call it TDT?
In NP, the comments are to provide Bob an excuse to two-box that does not result in the simulation two-boxing. In PD, the comments are there to illustrate that TDT needs a sophisticated algorithm for identifying copies of itself that can recognize different implementations of the same algorithm.
Do you understand why Bob acts differently in the two situations, now?
Bob is an AI. He’s programmed to look for similarities between other AIs and himself so that he can treat their action and his as logically linked when it is to his advantage to do so. I was arguing that a proper implementation of TDT should consider Bob’s and Alice’s decisions linked in PD and nonlinked in the NP variant. I don’t really understand your objection.
My objection is that an AI looking at the same question—is Alice functionally identical to me—can’t look for excuses why they’re not really the same when this would be useful, if they actually behave the same way. His answer should be the same in both cases, because they are either functionally identical or not.
The proper question is “In the context of the problems each of us face, is there a logical connection between my actions and Alice’s actions?”, not “Is Alice functionally identical to me?”
For reference, by “functionally identical” I meant “likely to choose the same way I do”. Thus, an agent that will abandon the test to eat beans is functionally identical when beans are unavailable.
I guess my previous response was unhelpful. Although “Is Alice functionally identical to me?” is not the question of primary concern, it is a relevant question. But another relevant question is “Is Alice facing the same problem that I am?” Two functionally identical agents facing different problems may make different choices.
In the architecture I’ve been envisioning, Alice and Bob can classify other agents as “identical to me in both algorithm and implementation” or “identical to me in algorithm, with differing implementation”, or one of many other categories. For each of the two categories I named, they would assume that an agent in that category will make the same decision as they would when presented with the same problem (so they would both be subcategories of “functionally identical”). In both situations, each agent classifies the other as identical in algorithm and differing in implementation.
In the prisoners’ dilemma, each agent is facing the same problem, that is, “I’m playing a prisoner’s dilemma with another agent that is identical to me in algorithm but differing in implementation”. So they treat their decisions as linked.
In the Newcomb’s problem variant, Alice’s problem is “I’m in Newcomb’s problem, and the predictor used a simulation that is identical to me in both algorithm and implementation, and which faced the same problem that I am facing.” Bob’s problem is “I’m in Newcomb’s problem, and the predictor used a simulation that is identical to me in algorithm but differing in implementation, and which faced the same situation as Alice.” There was a difference in the two problem descriptions even before the part about what problem the simulation faced, so when Bob notes that the simulation faced the same problem as Alice, he finds a difference between the problem that the simulation faced and the problem that he faces.
For each of the two categories I named, they would assume that an agent in that category will make the same decision as they would when presented with the same problem (so they would both be subcategories of “functionally identical”).
Then why are we talking about “Bob” and “Alice” when they’re both just TDT agents?
There is a difference. In the first one, the agents have a slight difference in their source code. In the second one, the source code of the two agents is identical.
If you’re claiming that TDT does not pay attention to such differences, then we only have a definitional dispute, and by your definition, an agent programmed the way I described would not be TDT. But I can’t think of anything about the standard descriptions of TDT that would indicate such a restriction. It is certainly not the “whole point” of TDT.
For now, I’m going to call the thing you’re telling me TDT is “TDT1”, and I’m going to call the agent architecture I was describing “TDT2″. I’m not sure if this is good terminology, so let me know if you’d rather call them something else.
Anyway, consider the four programs Alice1, Bob1, Alice2, and Bob2. Alice1 and Bob1 are implementations of TDT1, and are identical except for having a different identifier in the comments (and this difference changes nothing). Alice2 and Bob2 are implementations of TDT2, and are identical except for having a different identifier in the comments.
Consider the Newcomb’s problem variant with the first pair of agents (Alice1 and Bob1). Alice1 is facing the standard Newcomb’s problem, so she one-boxes and gets $1,000,000. As far as Bob1 can tell, he also faces the standard Newcomb’s problem (there is a difference, but he ignores it), so he one-boxes and gets $1,000,000.
Now consider the same problem, but with all instances of Alice1 replaced with Alice2, and all instances of Bob1 replaced with Bob2. Alice2 still faces the standard Newcomb’s problem, so she one-boxes and gets $1,000,000. But Bob2 two-boxes and gets $1,001,000.
The problem seems pretty fair; it doesn’t specifically reference either TDT1 or TDT2 in an attempt to discriminate. However, when we replace the TDT1 agents with TDT2 agents, one of them does better and neither of them does worse, which seems to indicate a pretty serious deficiency in TDT1.
Either TDT decides if something is identical based on it’s actions, in which case I am right, or it’s source code, in which case you are wrong, because such an agent would not cooperate in the Prisoner’s Dilemma.
They decide using the source code. I already explained why this results in them cooperating in the Prisoner’s Dilemma.
In the architecture I’ve been envisioning, Alice and Bob can classify other agents as “identical to me in both algorithm and implementation” or “identical to me in algorithm, with differing implementation”, or one of many other categories. For each of the two categories I named, they would assume that an agent in that category will make the same decision as they would when presented with the same problem (so they would both be subcategories of “functionally identical”). In both situations, each agent classifies the other as identical in algorithm and differing in implementation.
In the prisoners’ dilemma, each agent is facing the same problem, that is, “I’m playing a prisoner’s dilemma with another agent that is identical to me in algorithm but differing in implementation”. So they treat their decisions as linked.
Wait! I think I get it! In a Prisoner’s Dilemma, both agents are facing another agent, whereas in Newcomb’s Problem, Alice is facing an infinite chain of herself, whereas Bob is facing an infinite chain of someone else. It’s like the “favorite number” example in the followup post.
The right place to introduce the separation is not in between TDT and TDT-prime, but in between TDT-prime’s output and TDT-prime’s decision. If its output is a strategy, rather than a number of boxes, then that strategy can include a byte-by-byte comparison; and if TDT and TDT-prime both do it that way, then they both win as much as possible.
Can all the TDT variants adopt a common strategy, but with different execution results, depending on source-code self-inspection and sim-inspection? Can that approach really work in general without creating CliqueBots? Don’t know yet without detailed analysis.
Another issue is that Omega is not obliged to reveal the source-code of the sim; it could instead provide some information about the method used to generate / filter the sim code (e.g. a distribution the sim was drawn from) and still lead to a well-defined problem. Each TDT variant would not then know whether it was the sim.
I’m aiming for a follow-up article addressing this strategy (among others).
Can all the TDT variants adopt a common strategy, but with different execution results, depending on source-code self-inspection and sim-inspection?
This sounds equivalent to asking “can a turing machine generate non-deterministically random numbers?” Unless you’re thinking about coding TDT agents one at a time and setting some constant differently in each one.
Well, I’ve had a think about it, and I’ve concluded that it would matter how great the difference between TDT and TDT-prime is. If TDT-prime is almost the same as TDT, but has an extra stage in its algorithm in which it converts all dollar amounts to yen, it should still be able to prove that it is isomorphic to Omega’s simulation, and therefore will not be able to take advantage of “logical separation”.
But if TDT-prime is different in a way that makes it non-isomorphic, i.e. it sometimes gives a different output given the same inputs, that may still not be enough to “separate” them. If TDT-prime acts the same as TDT, except when there is a walrus in the vicinity, in which case it tries to train the walrus to fight crime, it is still the case in this walrus-free problem that it makes exactly the same choice as the simulation (?). It’s as if you need the ability to prove that two agents necessarily give the same output for the particular problem you’re faced with, without proving what output those agents actually give, and that sure looks crazy-hard.
EDIT: I mean crazy-hard for the general case, but much, much easier for all the cases where the two agents are actually the same.
EDIT 2: On the subject of fairness, my first thoughts: A fair problem is one in which if you had arrived at your decision by a coin flip (which is as transparently predictable as your actual decision process—i.e. Omega can predict whether it’s going to come down heads or tails with perfect accuracy), you would be rewarded or punished no more or less than you would be using your actual decision algorithm (and this applies to every available option).
EDIT 3: Sorry to go on like this, but I’ve just realised that won’t work in situations where some other agent bases their decision on whether you’re predicting what their decision will be, i.e. Prisoner’s Dilemma.
I also had thoughts along these lines—variants of TDT could logically separate themselves, so that T-0 one-boxes when it is simulated, but T-1 has proven that T-0 will one-box, and hence T-1 two-boxes when T-0 is the sim.
But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren’t linked) then they won’t co-operate with each other in Prisoner’s Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.
The second difficulty is that for each specific TDT variant, one with algorithm T’ say, there will be a specific problematic problem on which T’ will do worse than CDT (and indeed worse than all the other variants of TDT) - this is the problem with T’ being the exact algorithm running in the sim. So we still don’t get the—desirable—property that there is some sensible decision theory called TDT that is optimal across fair problems.
The best suggestion I’ve heard so far is that we try to adjust the definition of “fairness”, so that these problematic problems also count as “unfair”. I’m open to proposals on that one...
I think this is avoidable. Let’s say that there are two TDT programs called Alice and Bob, which are exactly identical except that Alice’s source code contains a comment identifying it as Alice, whereas Bob’s source code contains a comment identifying it as Bob. Each of them can read their own source code. Suppose that in problem 1, Omega reveals that the source code it used to run the simulation was Alice. Alice has to one-box. But Bob faces a different situation than Alice does, because he can find a difference between his own source code and the one Omega simulated, whereas Alice could not. So Bob can two-box without effecting what Alice would do.
However, if Alice and Bob play the prisoner’s dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the “Alice” comment replaced with “Bob”, and Bob faces a player identical to itself except with the “Bob” comment replaced with “Alice”. Hopefully, their algorithm would compress this information down to “The other player is identical to me, but has a comment difference in its source code”, at which point each player would be in an identical situation.
You might want to look at my follow-up article which discusses a strategy like this (among others). It’s worth noting that slight variations of the problem remove the opportunity for such “sneaky” strategies.
Ah, thanks. I had missed that, somehow.
In a prisoners dilemma Alice and Bob affect each others outcomes. In the newcomb problem, Alice affects Bobs outcome, but Bob doesn’t affect Alices outcome. That’s why it’s OK for Bob to consider himself different in the second case as long as he knows he is definitely not Alice (because otherwise he might actually be in a simulation) but not OK for him to consider himself different in the prisoners dilemma.
Why doesn’t that happen when dealing with Omega?
Because if Omega uses Alice’s source code, then Alice sees that the source code of the simulation is exactly the same as hers, whereas Bob sees that there is a comment difference, so the situation is not symmetric.
So why doesn’t that happen in the prisoner’s dilemma?
Because Alice sees that Bob’s source code is the same as hers except for a comment difference, and Bob sees that Alice’s source code is the same as his except for a comment difference, so the situation is symmetric.
Newcomb:
Prisoner’s Dilemma:
Do you see the contradiction here?
Newcomb, Alice: The simulation’s source code and available information is literally exactly the same as Alice’s, so if Alice 2-boxes, the simulation will too. There’s no way around it. So Alice one-boxes.
Newcomb, Bob: The simulation was in the situation described above. Bob thus predicts that it will one-box. Bob himself is in an entirely different situation, since he can see a source code difference, so if he two-boxes, it does not logically imply that the simulation will two-box. So Bob two-boxes and the simulation one-boxes.
Prisoner’s Dilemma: Alice sees Bob’s source code, and summarizes it as “identical to me except for a different comment”. Bob sees Alice’s source code, and summarizes it as “identical to me except for a different comment”. Both Alice and Bob run the same algorithm, and they now have the same input, so they must produce the same result. They figure this out, and cooperate.
Ignore Alice’s perspective for a second. Why is Bob acting differently? He’s seeing the same code both times.
Don’t ignore Alice’s perspective. Bob knows what Alice’s perspective is, so since there is a difference in Alice’s perspective, there is by extension a difference in Bob’s perspective.
Bob looks at the same code both times. In the PD, he treats it as identical to his own. In NP, he treats it as different. Why?
The source code that Bob is looking at is the same in each case, but the source code that [the source code that Bob is looking at] is looking at is different in the two situations.
NP: Bob is looking at Alice, who is looking at Alice, who is looking at Alice, …
PD: Bob is looking at Alice, who is looking at Bob, who is looking at Alice, …
Clarifying edit: In both cases, Bob concludes that the source code he is looking at is functionally equivalent to his own. But in NP, Bob treats the input to the program he is looking at as different from his input, whereas in PD, Bob treats the input to the program he is looking at as functionally equivalent to his input.
But you said Bob concludes that their decision theories are functionally identical, and thus it reduces to:
And yet this does not occur in NP.
EDIT:
The point is that his judgement of the source code changes, from “some other agent” to “another TDT agent”.
Looks like my edit was poorly timed.
One way of describing it is that the comment is extra information that is distinct from the decision agent, and that Bob can make use of this information when making his decision.
Oops, didn’t see that.
What’s the point of adding comments if Bob’s just going to conclude their code is functionally identical anyway? Doesn’t that mean that you might as well use the same code for Bob and Alice, and call it TDT?
In NP, the comments are to provide Bob an excuse to two-box that does not result in the simulation two-boxing. In PD, the comments are there to illustrate that TDT needs a sophisticated algorithm for identifying copies of itself that can recognize different implementations of the same algorithm.
Do you understand why Bob acts differently in the two situations, now?
I was assuming Bob was an AI, lacking a ghost to look over his code for reasonableness. If he’s not, then he isn’t strictly implementing TDT, is he?
Bob is an AI. He’s programmed to look for similarities between other AIs and himself so that he can treat their action and his as logically linked when it is to his advantage to do so. I was arguing that a proper implementation of TDT should consider Bob’s and Alice’s decisions linked in PD and nonlinked in the NP variant. I don’t really understand your objection.
My objection is that an AI looking at the same question—is Alice functionally identical to me—can’t look for excuses why they’re not really the same when this would be useful, if they actually behave the same way. His answer should be the same in both cases, because they are either functionally identical or not.
The proper question is “In the context of the problems each of us face, is there a logical connection between my actions and Alice’s actions?”, not “Is Alice functionally identical to me?”
I think those terms both mean the same thing.
For reference, by “functionally identical” I meant “likely to choose the same way I do”. Thus, an agent that will abandon the test to eat beans is functionally identical when beans are unavailable.
I guess my previous response was unhelpful. Although “Is Alice functionally identical to me?” is not the question of primary concern, it is a relevant question. But another relevant question is “Is Alice facing the same problem that I am?” Two functionally identical agents facing different problems may make different choices.
In the architecture I’ve been envisioning, Alice and Bob can classify other agents as “identical to me in both algorithm and implementation” or “identical to me in algorithm, with differing implementation”, or one of many other categories. For each of the two categories I named, they would assume that an agent in that category will make the same decision as they would when presented with the same problem (so they would both be subcategories of “functionally identical”). In both situations, each agent classifies the other as identical in algorithm and differing in implementation.
In the prisoners’ dilemma, each agent is facing the same problem, that is, “I’m playing a prisoner’s dilemma with another agent that is identical to me in algorithm but differing in implementation”. So they treat their decisions as linked.
In the Newcomb’s problem variant, Alice’s problem is “I’m in Newcomb’s problem, and the predictor used a simulation that is identical to me in both algorithm and implementation, and which faced the same problem that I am facing.” Bob’s problem is “I’m in Newcomb’s problem, and the predictor used a simulation that is identical to me in algorithm but differing in implementation, and which faced the same situation as Alice.” There was a difference in the two problem descriptions even before the part about what problem the simulation faced, so when Bob notes that the simulation faced the same problem as Alice, he finds a difference between the problem that the simulation faced and the problem that he faces.
Then why are we talking about “Bob” and “Alice” when they’re both just TDT agents?
Because if Bob does not ignore the implementation difference, he ends up with more money in the Newcomb’s problem variant.
But there is no difference between “Bob looking at Alice looking at Bob” and “Alice looking at Alice looking at Alice”. That’s the whole point of TDT.
There is a difference. In the first one, the agents have a slight difference in their source code. In the second one, the source code of the two agents is identical.
If you’re claiming that TDT does not pay attention to such differences, then we only have a definitional dispute, and by your definition, an agent programmed the way I described would not be TDT. But I can’t think of anything about the standard descriptions of TDT that would indicate such a restriction. It is certainly not the “whole point” of TDT.
For now, I’m going to call the thing you’re telling me TDT is “TDT1”, and I’m going to call the agent architecture I was describing “TDT2″. I’m not sure if this is good terminology, so let me know if you’d rather call them something else.
Anyway, consider the four programs Alice1, Bob1, Alice2, and Bob2. Alice1 and Bob1 are implementations of TDT1, and are identical except for having a different identifier in the comments (and this difference changes nothing). Alice2 and Bob2 are implementations of TDT2, and are identical except for having a different identifier in the comments.
Consider the Newcomb’s problem variant with the first pair of agents (Alice1 and Bob1). Alice1 is facing the standard Newcomb’s problem, so she one-boxes and gets $1,000,000. As far as Bob1 can tell, he also faces the standard Newcomb’s problem (there is a difference, but he ignores it), so he one-boxes and gets $1,000,000.
Now consider the same problem, but with all instances of Alice1 replaced with Alice2, and all instances of Bob1 replaced with Bob2. Alice2 still faces the standard Newcomb’s problem, so she one-boxes and gets $1,000,000. But Bob2 two-boxes and gets $1,001,000.
The problem seems pretty fair; it doesn’t specifically reference either TDT1 or TDT2 in an attempt to discriminate. However, when we replace the TDT1 agents with TDT2 agents, one of them does better and neither of them does worse, which seems to indicate a pretty serious deficiency in TDT1.
Either TDT decides if something is identical based on it’s actions, in which case I am right, or it’s source code, in which case you are wrong, because such an agent would not cooperate in the Prisoner’s Dilemma.
They decide using the source code. I already explained why this results in them cooperating in the Prisoner’s Dilemma.
Wait! I think I get it! In a Prisoner’s Dilemma, both agents are facing another agent, whereas in Newcomb’s Problem, Alice is facing an infinite chain of herself, whereas Bob is facing an infinite chain of someone else. It’s like the “favorite number” example in the followup post.
Yes.
Well that took embarrassingly long.
The right place to introduce the separation is not in between TDT and TDT-prime, but in between TDT-prime’s output and TDT-prime’s decision. If its output is a strategy, rather than a number of boxes, then that strategy can include a byte-by-byte comparison; and if TDT and TDT-prime both do it that way, then they both win as much as possible.
But doesn’t that make cliquebots, in general?
I’m thinking hard about this one…
Can all the TDT variants adopt a common strategy, but with different execution results, depending on source-code self-inspection and sim-inspection? Can that approach really work in general without creating CliqueBots? Don’t know yet without detailed analysis.
Another issue is that Omega is not obliged to reveal the source-code of the sim; it could instead provide some information about the method used to generate / filter the sim code (e.g. a distribution the sim was drawn from) and still lead to a well-defined problem. Each TDT variant would not then know whether it was the sim.
I’m aiming for a follow-up article addressing this strategy (among others).
This sounds equivalent to asking “can a turing machine generate non-deterministically random numbers?” Unless you’re thinking about coding TDT agents one at a time and setting some constant differently in each one.
Well, I’ve had a think about it, and I’ve concluded that it would matter how great the difference between TDT and TDT-prime is. If TDT-prime is almost the same as TDT, but has an extra stage in its algorithm in which it converts all dollar amounts to yen, it should still be able to prove that it is isomorphic to Omega’s simulation, and therefore will not be able to take advantage of “logical separation”.
But if TDT-prime is different in a way that makes it non-isomorphic, i.e. it sometimes gives a different output given the same inputs, that may still not be enough to “separate” them. If TDT-prime acts the same as TDT, except when there is a walrus in the vicinity, in which case it tries to train the walrus to fight crime, it is still the case in this walrus-free problem that it makes exactly the same choice as the simulation (?). It’s as if you need the ability to prove that two agents necessarily give the same output for the particular problem you’re faced with, without proving what output those agents actually give, and that sure looks crazy-hard.
EDIT: I mean crazy-hard for the general case, but much, much easier for all the cases where the two agents are actually the same.
EDIT 2: On the subject of fairness, my first thoughts: A fair problem is one in which if you had arrived at your decision by a coin flip (which is as transparently predictable as your actual decision process—i.e. Omega can predict whether it’s going to come down heads or tails with perfect accuracy), you would be rewarded or punished no more or less than you would be using your actual decision algorithm (and this applies to every available option).
EDIT 3: Sorry to go on like this, but I’ve just realised that won’t work in situations where some other agent bases their decision on whether you’re predicting what their decision will be, i.e. Prisoner’s Dilemma.