TurnTrout comments on Attainable Utility Preservation: Empirical Results

TurnTrout 28 Jul 2020 12:43 UTC
LW: 5 AF: 2
AF
Looking back at the sequence now, I realize that the “How agents impact each other” part of the sequence was primarily about explaining why we don’t need to do that and the previous post was declaring victory on that front, but it took me seeing the formalism here to really get it.
I now think of the main results of the sequence thus far as “impact depends on goals (part 1); nonetheless an impact measure can just be about power of the agent (part 2)”
Yes, this is exactly what the plan was. :)
I don’t understand how (1) and (2) are conceptually different (aren’t both about causing irreversible changes?)
Yeah, but one doesn’t involve visibly destroying an object, which matters for certain impact measures (like whitelisting). You’re right that they’re quite similar.
normalized.
Turns out you don’t need the normalization, per the linked SafeLife paper. I’d probably just take it out of the equations, looking back. Complication often isn’t worth it.
the first one [fails] at (4)
I think the n-step stepwise inaction baseline doesn’t fail at any of them?
- Rafael Harth 28 Jul 2020 14:03 UTC
  3 points
  Parent
  Turns out you don’t need the normalization, per the linked SafeLife paper. I’d probably just take it out of the equations, looking back. Complication often isn’t worth it.
  It’s also slightly confusing in this case because the post doesn’t explain it, which made me wonder, “am I supposed to understand what it’s for?” But it is explained in the conservative agency paper.
  I think the n-step stepwise inaction baseline doesn’t fail at any of them?
  Yeah, but the first one was “[comparing AU for aux. goal if I do this action to] AU for aux. goal if I do nothing”