that’s value(action) = sum_j( prob(outcome_j GIVEN action) * D(outcome_j) ), what is D? it is not at all obvious to me that there’s a straightforward way to parameterize D for learning that is self-consistent in moral dilemmas.
value(action) = sum_j( prob(outcome_j GIVEN action) * D(outcome_j) )
That should be U—it is the utility function which computes the utility of a future universe.
that’s
value(action) = sum_j( prob(outcome_j GIVEN action) * D(outcome_j) )
, what is D? it is not at all obvious to me that there’s a straightforward way to parameterize D for learning that is self-consistent in moral dilemmas.That should be U—it is the utility function which computes the utility of a future universe.