EliasHasle comments on Avoiding Side Effects in Complex Environments

EliasHasle 26 Apr 2023 13:13 UTC
1 point
What if the encoding difference penalty were applied after a counterfactual rollout of no-ops after the candidate action or no-op? Couldn’t that detect “butterfly effects” of small impactful actions, avoiding “salami slicing” exploits?
Building upon this thought, how about comparing mutated policies to a base policy by sampling possible futures to generate distributions of the encodings up to the farthest step and penalize divergence from the base policy?
Or just train a sampling policy by GD, using a Monte Carlo Tree Search that penalizes actions which alter the future encodings when compared to a pure no-op policy.
What links here?
- EliasHasle's comment on Avoiding Side Effects in Complex Environments by TurnTrout (26 Apr 2023 13:19 UTC; 1 point)