Rafael Harth comments on Attainable Utility Landscape: How The World Is Changed

Rafael Harth 24 Jul 2020 18:56 UTC
LW: 2 AF: 1
AF
The technical appendix felt like it was more difficult than previous posts, but I had the advantage of having tried to read the paper from the preceding post yesterday and managed to reconstruct the graph & gamma correctly.
The early part is slightly confusing, though. I thought AU is a thing that belongs to the goal of an agent, but the picture made it look as if it’s part of the object (“how fertile is the soil?”). Is the idea here that the soil-AU is slang for “AU of goal ‘plant stuff here’”?
I did interpret the first exercise as “you planned to go onto the moon” and came up with stuff like “how valuable are the stones I can take home” and “how pleasant will it be to hang around.”
One thing I noticed is that the formal policies don’t allow for all possible “strategies.” In the graph we had to reconstruct, I can’t start at s1, then go to s1 once and then go to s3. So you could think of the larger set $Π_{L}$ where the policies are allowed to depend on the time step. But I assume there’s no point unless the reward function also depends on the time step. (I don’t know anything about MDPs.)
Am I correct that a deterministic transition function is a function $T : S \times A \to S$ and a non-deterministic one is a function $T : S \times A \times S \to [0, 1]$ ?
- TurnTrout 25 Jul 2020 21:12 UTC
  LW: 4 AF: 2
  AF Parent
  Is the idea here that the soil-AU is slang for “AU of goal ‘plant stuff here’”?
  yes
  One thing I noticed is that the formal policies don’t allow for all possible “strategies.”
  yeah, this is because those are “nonstationary” policies—you change your mind about what to do at a given state. A classic result in MDP theory is that you never need these policies to find an optimal policy.
  Am I correct that a deterministic transition function is
  yup!