TurnTrout comments on Attainable Utility Preservation: Empirical Results

TurnTrout 22 Feb 2020 20:07 UTC
LW: 2 AF: 1
AF
Decreases or increases?

Decreases. Here, the “human” is just a block which paces back and forth. Removing the block removes access to all states containing that block.
1. Is “Model-free AUP” the same as “AUP stepwise”?
Yes. See the paper for more details.
1. Why does “Model-free AUP” wait for the pallet to reach the human before moving, while the “Vanilla” agent does not?
I’m pretty sure it’s just an artifact of the training process and the penalty term. I remember investigating it in 2018 and concluding it wasn’t anything important, but unfortunately I don’t recall the exact explanation.

I wonder how this interacts with environments where access to states is always closing off. (StarCraft, Go, Chess, etc. - though it’s harder to think of how state/agent are ‘contained’ in these games.)

It would still try to preserve access to future states as much as possible with respect to doing nothing that turn.

Is the code for the SafeLife PPO-AUP stuff you did on github?

Here. Note that we’re still ironing things out, but the preliminary results have been pretty solid.