Turns out you don’t need the normalization, per the linked SafeLife paper. I’d probably just take it out of the equations, looking back. Complication often isn’t worth it.
It’s also slightly confusing in this case because the post doesn’t explain it, which made me wonder, “am I supposed to understand what it’s for?” But it is explained in the conservative agency paper.
I think the n-step stepwise inaction baseline doesn’t fail at any of them?
Yeah, but the first one was “[comparing AU for aux. goal if I do this action to] AU for aux. goal if I do nothing”
It’s also slightly confusing in this case because the post doesn’t explain it, which made me wonder, “am I supposed to understand what it’s for?” But it is explained in the conservative agency paper.
Yeah, but the first one was “[comparing AU for aux. goal if I do this action to] AU for aux. goal if I do nothing”