TurnTrout comments on Avoiding Side Effects in Complex Environments

TurnTrout 13 Dec 2020 22:00 UTC
LW: 3 AF: 1
AF
Thanks!
I don’t know what the autoencoder’s doing well enough to make a prediction there, other than the baseline prediction of “smaller changes to the agent’s set of attainable utilities, are harder to detect.” I think a bigger problem will be spatial distance: in a free-ranging robotics task, if the agent has a big impact on something a mile away, maybe that’s unlikely to show up in any of the auxiliary value estimates and so it’s unlikely to be penalized.
- EliasHasle 26 Apr 2023 13:13 UTC
  1 point
  Parent
  What if the encoding difference penalty were applied after a counterfactual rollout of no-ops after the candidate action or no-op? Couldn’t that detect “butterfly effects” of small impactful actions, avoiding “salami slicing” exploits?
  Building upon this thought, how about comparing mutated policies to a base policy by sampling possible futures to generate distributions of the encodings up to the farthest step and penalize divergence from the base policy?
  Or just train a sampling policy by GD, using a Monte Carlo Tree Search that penalizes actions which alter the future encodings when compared to a pure no-op policy.
  What links here?
  - EliasHasle's comment on Avoiding Side Effects in Complex Environments by TurnTrout (26 Apr 2023 13:19 UTC; 1 point)