A “true” EDT agent needs to update on all the evidence they’ve ever observed, and it’s very unclear to me how to do this in practice.
The only way I know how to explore what this means is to use simple toy problems and be very careful about never ever using the concept “reference class.” Oh, and writing down algorithms helps stop you from sneaking in extra information.
Example algorithm (basic EDT):
We want to pick one action out of a list of possible actions (provided to us as a1, a2...), which can lead to various outcomes that we have preferences over, quantified by a utility (provided to us as [o1, u1], [o2, u2]...). To connect actions to outcomes we are provided a matrix of conditional probabilities P(outcome|action) (P(o1|a1), P(o1|a2)… P(o2|a1)… …). We then assign expected utility to actions by summing over utilities weighted by the conditional probability of their outcomes, and pick the action with the highest expected utility.
Note that this algorithm is identical to the analogous CDT algorithm. The difference is a cosmetic variable name change. Thus if we want to differentiate between them at all, we’ll need a second algorithm to feed our first one the matrix P(o|a).
The second algorithm for basic EDT has the form:
Start with some information Z about yourself. For an agent described by Z, use a probability-calculating program to find P(o|aZ) for that agent. (Note that this second algorithm could have contained the concept “reference class,” but doesn’t, and thus will actually be useful)
The only way I know how to explore what this means is to use simple toy problems and be very careful about never ever using the concept “reference class.” Oh, and writing down algorithms helps stop you from sneaking in extra information.
Example algorithm (basic EDT):
We want to pick one action out of a list of possible actions (provided to us as a1, a2...), which can lead to various outcomes that we have preferences over, quantified by a utility (provided to us as [o1, u1], [o2, u2]...). To connect actions to outcomes we are provided a matrix of conditional probabilities P(outcome|action) (P(o1|a1), P(o1|a2)… P(o2|a1)… …). We then assign expected utility to actions by summing over utilities weighted by the conditional probability of their outcomes, and pick the action with the highest expected utility.
Note that this algorithm is identical to the analogous CDT algorithm. The difference is a cosmetic variable name change. Thus if we want to differentiate between them at all, we’ll need a second algorithm to feed our first one the matrix P(o|a).
The second algorithm for basic EDT has the form:
Start with some information Z about yourself. For an agent described by Z, use a probability-calculating program to find P(o|aZ) for that agent. (Note that this second algorithm could have contained the concept “reference class,” but doesn’t, and thus will actually be useful)