I’m curious why you say it handles Newcomb’s problem well. The Nirvana trick seems like an artificial intervention where we manually assign certain situations a utility of infinity to enforce a consistent condition which then ensures they are ignored when calculating the maximin. If we are manually intervening, why not just manually cross out the cases we wish to ignore, instead of adding them with infinite value then immediately ignoring them.
Just because we modelled this using infrabayesianism, it doesn’t follow that it contributed anything to the solution. It feels like we just got out what we put in, but that this is obscured by a philosophical shell game. The reason why it feels compelling is though we’re only adding in an option to then immediately ignore it, this is sufficient to give us a fake sense of having made a non-trivial decision.
It would seem that infrabayesianism might be contributing to our understanding of the problem if the infinite utility arose organically, but as far as I can tell, this is a purely artificial intervention.
I think this is made clearer by Thomas Larson’s explanation of infrabayesianism failing Transparent Newcomb’s. It seems clear to me that this isn’t an edge case; instead it demonstrates that rather than solving counterfactuals, all this trick does is give you back what you put in (one-boxing in the case where you see proof you one-box, two-boxing in the case where you see proof you two-box).
(Vanessa claims to have a new intervention that makes the Nirvana trick redundant, if this doesn’t fall prey to the same issues, I’d love to know)
You don’t need the Nirvana trick if you’re using homogeneous or fully general ultracontributions and you allow “convironments” (semimeasure-environments) in your notion of law causality. Instead of positing a transition to a “Nirvana” state, you just make the transition kernel vanish identically in those situations.
However, this is a detail, there is a more central point that you’re missing. From my perspective, the reason Newcomb-like thought experiments are important is because they demonstrate situations in which classical formal approaches to agency produce answers that seems silly. Usually, the classical approaches examined in this context are CDT and EDT. However, CDT and EDT are both too toyish for this purpose, since they ignore learning and instead assume the agent already knows how the world works, and moreover this knowledge is represented in the preferable form of the corresponding decision theory. Instead, we should be thinking about learning agents, and the classical framework for those is reinforcement learning (RL). With RL, we can operationalize the problem thus: if a classical RL agent is put into an arbitrary repeated[1] Newcomb-like game, it fails to converge to the optimal reward (although it does success for the original Newcomb problem!)
On the other hand, an infra-Bayesian RL agent provably does converge to optimal reward in those situations, assuming pseudocausality. Ofc IBRL is just a desideratum, not a concrete algorithm. But examples like Tian et al and my own upcoming paper about IB bandits show that there are algorithms with reasonably good IB regret bounds for natural hypotheses classes. While an algorithm with a good regret bound[2] for ultra-POMDPs[3] has not yet been proposed, it seems very like that it exists.
Now, about non-pseudocausal scenarios (such as noiseless transparent Newcomb). While this is debatable, I’m leaning towards the view that we actually shouldn’t expect agents to succeed there. This became salient to me when looking at counterfactuals in infra-Bayesian physicalism. [EDIT: actually, things are different in IBP, see comment below.] The problem with non-pseudocausal updatelessness is that you expect the agent to follow the optimal policy even after making an observation that, according to the assumptions, can never happen, not even with low probability. This sounds like it might make sense when viewing an individual problem, but in the context of learning it is impossible. Learning requires that an agent that sees an observation which is impossible according to hypothesis H, discards hypothesis H and acts on the other hypotheses in has. There is being updateless, and then there is being too updateless :)
Scott Garrabrant wrote somewhere recently that there is tension between Bayesian updates and reflective consistency, and that he thinks reflective consistency is so important that we should sacrifice Bayesian updates. I agree that there is tension, and that reflective consistency is really important, and that Bayesian updates should be partially sacrificed, but it’s possible to take this too far. In Yudkowsky’s original paper on TDT he gives the example of an alphabetizing agent as something that can be selected for by certain decision problems. Ofc this doesn’t prove non-alphabetizing is irrational. He argues that we need some criterion of “fairness” to decide which decisions problem count. I think that pseudocausality should be a part of the fairness criterion, because without that we don’t get learnability[4]: and learnability is so important that I’m willing to sacrifice reflective consistency in non-pseudocausal scenarios!
Instead of literal repetition, we could examine more complicated situations where information accumulates over time so that the nature of the game can be confidently inferred in the limit. But, the principle is the same.
If you don’t care about the specific regret bound then it’s easy to come up with an algorithm based on Exp3, but that’s just reducing the problem to blind trial and error of different policies, which is missing the point. The point being, the ability to exploit regularities in the world which also applies to Newcomb-like scenarios.
You need ultra-POMDPs to model e.g. counterfactual mugging. Even ordinary POMDPs have been relatively neglected in the literature, because the control problem is PSPACE-hard. Dealing with that is an interesting question, but it seems orthogonal to the philosophical issues that arise from Newcomb.
To be honest, I’ve been persuaded that we disagree enough in our fundamental philosophical approaches, that I’m not planning to deeply dive into infrabayesianism, so I can’t respond to many of your technical points (though I am planning to read the remaining parts of Thomas Larson’s summary and see if any of your talks have been recorded).
“However, CDT and EDT are both too toyish for this purpose, since they ignore learning and instead assume the agent already knows how the world works, and moreover this knowledge is represented in the preferable form of the corresponding decision theory”—this is one insight I took from infrabayesianism. I would have highlighted this in my comment, but I forgot to mention it.
“ Learning requires that an agent that sees an observation which is impossible according to hypothesis H, discards hypothesis H and acts on the other hypotheses in has”—I have higher expectations from learning agents—that they learn to solve such problems despite the difficulties.
I have higher expectations from learning agents—that they learn to solve such problems despite the difficulties.
I’m saying that there’s probably a literal impossibility theorem lurking there.
But, after reading my comment above, my spouse Marcus correctly pointed out that I am mischaracterizing IBP. As opposed to IBRL, in IBP, pseudocausality is not quite the right fairness condition. In fact, in a straightforward operationalization of repeated full-box-dependent transparent Newcomb, an IBP agent would one-box. However, there are more complicated situations where it would deviate from full-fledged UDT.
Example 1: You choose whether to press button A or button B. After this, you play Newcomb. Omega fills the box iff you one-box both in the scenario in which you pressed button A and in the scenario in which you pressed button B. Random is not allowed. A UDT agent will one-box. An IBP agent might two-box because it considers the hypothetical in which it pressed a button different from what it actually intended to press to be “not really me” and therefore unpredictable. (Essentially, the policy is ill-defined off-policy.)
Example 2: You see either a green light or a red light, and then choose between button A and button B. After this, you play Newcomb. Omega fills the box iff you either one-box after seeing green and pressing A or one-box after seeing green and pressing B. However, you always see red. A UDT agent will one-box if it saw the impossible green and two-box if it saw red. An IBP agent might two-box either way, because if it remembers seeing green then it decides that all of its assumptions about the world need to be revised.
I’m curious why you say it handles Newcomb’s problem well. The Nirvana trick seems like an artificial intervention where we manually assign certain situations a utility of infinity to enforce a consistent condition which then ensures they are ignored when calculating the maximin. If we are manually intervening, why not just manually cross out the cases we wish to ignore, instead of adding them with infinite value then immediately ignoring them.
Just because we modelled this using infrabayesianism, it doesn’t follow that it contributed anything to the solution. It feels like we just got out what we put in, but that this is obscured by a philosophical shell game. The reason why it feels compelling is though we’re only adding in an option to then immediately ignore it, this is sufficient to give us a fake sense of having made a non-trivial decision.
It would seem that infrabayesianism might be contributing to our understanding of the problem if the infinite utility arose organically, but as far as I can tell, this is a purely artificial intervention.
I think this is made clearer by Thomas Larson’s explanation of infrabayesianism failing Transparent Newcomb’s. It seems clear to me that this isn’t an edge case; instead it demonstrates that rather than solving counterfactuals, all this trick does is give you back what you put in (one-boxing in the case where you see proof you one-box, two-boxing in the case where you see proof you two-box).
(Vanessa claims to have a new intervention that makes the Nirvana trick redundant, if this doesn’t fall prey to the same issues, I’d love to know)
You don’t need the Nirvana trick if you’re using homogeneous or fully general ultracontributions and you allow “convironments” (semimeasure-environments) in your notion of law causality. Instead of positing a transition to a “Nirvana” state, you just make the transition kernel vanish identically in those situations.
However, this is a detail, there is a more central point that you’re missing. From my perspective, the reason Newcomb-like thought experiments are important is because they demonstrate situations in which classical formal approaches to agency produce answers that seems silly. Usually, the classical approaches examined in this context are CDT and EDT. However, CDT and EDT are both too toyish for this purpose, since they ignore learning and instead assume the agent already knows how the world works, and moreover this knowledge is represented in the preferable form of the corresponding decision theory. Instead, we should be thinking about learning agents, and the classical framework for those is reinforcement learning (RL). With RL, we can operationalize the problem thus: if a classical RL agent is put into an arbitrary repeated[1] Newcomb-like game, it fails to converge to the optimal reward (although it does success for the original Newcomb problem!)
On the other hand, an infra-Bayesian RL agent provably does converge to optimal reward in those situations, assuming pseudocausality. Ofc IBRL is just a desideratum, not a concrete algorithm. But examples like Tian et al and my own upcoming paper about IB bandits show that there are algorithms with reasonably good IB regret bounds for natural hypotheses classes. While an algorithm with a good regret bound[2] for ultra-POMDPs[3] has not yet been proposed, it seems very like that it exists.
Now, about non-pseudocausal scenarios (such as noiseless transparent Newcomb). While this is debatable, I’m leaning towards the view that we actually shouldn’t expect agents to succeed there. This became salient to me when looking at counterfactuals in infra-Bayesian physicalism. [EDIT: actually, things are different in IBP, see comment below.] The problem with non-pseudocausal updatelessness is that you expect the agent to follow the optimal policy even after making an observation that, according to the assumptions, can never happen, not even with low probability. This sounds like it might make sense when viewing an individual problem, but in the context of learning it is impossible. Learning requires that an agent that sees an observation which is impossible according to hypothesis H, discards hypothesis H and acts on the other hypotheses in has. There is being updateless, and then there is being too updateless :)
Scott Garrabrant wrote somewhere recently that there is tension between Bayesian updates and reflective consistency, and that he thinks reflective consistency is so important that we should sacrifice Bayesian updates. I agree that there is tension, and that reflective consistency is really important, and that Bayesian updates should be partially sacrificed, but it’s possible to take this too far. In Yudkowsky’s original paper on TDT he gives the example of an alphabetizing agent as something that can be selected for by certain decision problems. Ofc this doesn’t prove non-alphabetizing is irrational. He argues that we need some criterion of “fairness” to decide which decisions problem count. I think that pseudocausality should be a part of the fairness criterion, because without that we don’t get learnability[4]: and learnability is so important that I’m willing to sacrifice reflective consistency in non-pseudocausal scenarios!
Instead of literal repetition, we could examine more complicated situations where information accumulates over time so that the nature of the game can be confidently inferred in the limit. But, the principle is the same.
If you don’t care about the specific regret bound then it’s easy to come up with an algorithm based on Exp3, but that’s just reducing the problem to blind trial and error of different policies, which is missing the point. The point being, the ability to exploit regularities in the world which also applies to Newcomb-like scenarios.
You need ultra-POMDPs to model e.g. counterfactual mugging. Even ordinary POMDPs have been relatively neglected in the literature, because the control problem is PSPACE-hard. Dealing with that is an interesting question, but it seems orthogonal to the philosophical issues that arise from Newcomb.
Although there is still room for a fairness criterion weaker than pseudocausality but stronger that the imagined fairness criterion of UDT.
Thanks for the detailed response.
To be honest, I’ve been persuaded that we disagree enough in our fundamental philosophical approaches, that I’m not planning to deeply dive into infrabayesianism, so I can’t respond to many of your technical points (though I am planning to read the remaining parts of Thomas Larson’s summary and see if any of your talks have been recorded).
“However, CDT and EDT are both too toyish for this purpose, since they ignore learning and instead assume the agent already knows how the world works, and moreover this knowledge is represented in the preferable form of the corresponding decision theory”—this is one insight I took from infrabayesianism. I would have highlighted this in my comment, but I forgot to mention it.
“ Learning requires that an agent that sees an observation which is impossible according to hypothesis H, discards hypothesis H and acts on the other hypotheses in has”—I have higher expectations from learning agents—that they learn to solve such problems despite the difficulties.
I’m saying that there’s probably a literal impossibility theorem lurking there.
But, after reading my comment above, my spouse Marcus correctly pointed out that I am mischaracterizing IBP. As opposed to IBRL, in IBP, pseudocausality is not quite the right fairness condition. In fact, in a straightforward operationalization of repeated full-box-dependent transparent Newcomb, an IBP agent would one-box. However, there are more complicated situations where it would deviate from full-fledged UDT.
Example 1: You choose whether to press button A or button B. After this, you play Newcomb. Omega fills the box iff you one-box both in the scenario in which you pressed button A and in the scenario in which you pressed button B. Random is not allowed. A UDT agent will one-box. An IBP agent might two-box because it considers the hypothetical in which it pressed a button different from what it actually intended to press to be “not really me” and therefore unpredictable. (Essentially, the policy is ill-defined off-policy.)
Example 2: You see either a green light or a red light, and then choose between button A and button B. After this, you play Newcomb. Omega fills the box iff you either one-box after seeing green and pressing A or one-box after seeing green and pressing B. However, you always see red. A UDT agent will one-box if it saw the impossible green and two-box if it saw red. An IBP agent might two-box either way, because if it remembers seeing green then it decides that all of its assumptions about the world need to be revised.