Vanessa Kosoy comments on Formalising decision theory is hard

Vanessa Kosoy 28 Sep 2019 11:28 UTC
LW: 4 AF: 2
AF

...But if you model the decision as being made over an amount of time, then you could possibly make the decision early on in your deliberation, so that the predictor can see.

Yes, sounds sort of reasonable. Here is how I think you can realize this using TRL.

As usual, we consider the agent playing an IPD against a predictor (Newcomb’s paradox is essentially playing the Prisoner’s Dilemma against a “FairBot”). On each round, the predictor gets to see the agent’s state at the start of the round. (The state can be considered part of the “source code”. For randomizing agents, we also assume the predictors sees the random bits). The predictor then tries to simulate the agent (we assume it knows the rest of the agent’s source code as well), and is successful if the agent doesn’t execute any programs that are too expensive for the predictor (for the sake of simplicity, assume that no program started on one round continues running during following rounds: I don’t think that this assumption makes a difference of principle). Otherwise, the prediction might be wrong (for example, we can assume it defaults to D). The predictor then plays D or C according to its prediction of the agent’s action.

In this setting, the agent can learn the incomplete hypothesis “if I don’t run expensive programs and I play C, the predictor will also play C”. (We assume that the prior allows for side effects of executing programs. Such a prior seems more realistic anyway, and in particular is required to counter non-Cartesian daemons. However, it also has a cost, so perhaps what we really want is a prior that is biased towards few side effects: but, this is immaterial for the current discussion.) This hypotheses guarantees a payoff of U(CC). Assuming that the predictor cannot be exploited, this is the best possible payoff and therefore the agent will converge to cooperation.

We might want a more explicit realization of the “simulates” part in “agent simulates predictor”. For this, we can assume the agent also receives its own state as an observation (but, I’m not sure how generally useful is this for realistic agents). The agent can then also learn the incomplete hypothesis describing the exact function from agent states to predictor output. However, this hypothesis doesn’t affect the outcome: it doesn’t predict the agent’s state and therefore can only guarantee the payoff U(DD).
What links here?
- Vanessa Kosoy's comment on Open and Welcome Thread – July 2021 by habryka (31 Jul 2021 20:05 UTC; 4 points)