Thank you! I’m interested in checking out earlier chapters to make sure I understand the notation, but here’s my current understanding:
There are 7 axioms that go into Joyce’s representation theorem, and none of them seem to put any constraints on the set of actions available to the agent. So we should be able to ask a Joyce-rational agent to choose a policy for a game.
My impression of the representation theorem is that a formula like can represent a variety of decision theories. Including ones like CDT which are dynamically inconsistent: they have a well-defined answer to “what do you think is the best policy”, and it’s not necessarily consistent with their answer to “what are you actually going to do?”
So it seems like the axioms are consistent with policy optimization, and they’re also consistent with action optimization. We can ask a decision theory to optimize a policy using an analogous expression: .
It seems like we should be able to get a lot of leverage by imposing a consistency requirement that these two expressions line up. It shouldn’t matter whether we optimize over actions or policies, the actions taken should be the same.
I don’t expect that fully specifies how to calculate the counterfactual data structures and , even with Joyce’s other 7 axioms. But the first 7 didn’t rule out dynamic or counterfactual inconsistency, and this should at least narrow our search down to decision theories that are able to coordinate with themselves at other points in the game tree.
What are some obstacles to superintelligences performing effective logical handshakes? Or equivalently, what are some necessary conditions that seem difficult to bring about, even for very smart software systems?
(My understanding of the term “logical handshake” is as a generalization of the technique from the Robust Cooperation paper. Something like “I have a model of the other relevant decision-makers, and I will enact my part of the joint policy ϕ if I’m sufficiently confident that they’ll all enact their part of ϕ.” Is that the sort of decision-procedure that seems likely to fall into commitment races?)