The idea is that, when the AI’s utility changes from u to v at time t, it maximises a meta-utility U
What does it mean for the meta-utility U to depend on the time step t? My understanding is that utility functions are over world histories; thus it doesn’t make sense for them to depend on the time step.
My guess is that you meant that both u and v are expressed as a sum of rewards over time, and the meta-utility sums the rewards of u before t with the rewards of v after t (plus an expected reward correction term); is this correct?
What does it mean for the meta-utility U to depend on the time step t? My understanding is that utility functions are over world histories; thus it doesn’t make sense for them to depend on the time step.
My guess is that you meant that both u and v are expressed as a sum of rewards over time, and the meta-utility sums the rewards of u before t with the rewards of v after t (plus an expected reward correction term); is this correct?