jessicata comments on Double indifference is better indifference

jessicata 4 May 2016 21:58 UTC
0 points
AF

The idea is that, when the AI’s utility changes from u to v at time t, it maximises a meta-utility U

What does it mean for the meta-utility U to depend on the time step t? My understanding is that utility functions are over world histories; thus it doesn’t make sense for them to depend on the time step.

My guess is that you meant that both $u$ and $v$ are expressed as a sum of rewards over time, and the meta-utility sums the rewards of $u$ before $t$ with the rewards of $v$ after $t$ (plus an expected reward correction term); is this correct?