Gurkenglas comments on Subagents and impact measures, full and fully illustrated

Gurkenglas 25 Feb 2020 18:52 UTC
1 point
It’s only equal to the inaction baseline on the first step. It has the step of divergence always be the last step.

Note that the stepwise pi0 baseline suggests using different baselines per auxiliary reward, namely the action that maximizes that auxiliary reward. Or equivalently, using the stepwise inaction baseline where the effect of inaction is that no time passes.

I’ll also remind here that it looks like instead of merely maximizing the auxiliary reward as a baseline, we ought to also apply an impact penalty to compute the baseline.
- Stuart_Armstrong 25 Feb 2020 20:56 UTC
  2 points
  Parent
  I’m not following you here. Could you put this into equations/examples?
  - Gurkenglas 26 Feb 2020 0:00 UTC
    1 point
    Parent
    Here’s three sentences that might illuminate their respective paragraph. If they don’t, ask again.
    The stepwise inaction baseline with inaction rollouts already uses the same policy for $s_{t}^{'}$ and rollouts, and yet it is not the inaction baseline.
    Why not set $s_{t}^{'} = s_{t - 1}$ ?
    Why not subtract $ω D$ from every $R$ (in a fixpointy way)?
    What links here?
    Gurkenglas's comment on Power as Easily Exploitable Opportunities by TurnTrout (1 Aug 2020 13:03 UTC; 2 points)
    - Stuart_Armstrong 26 Feb 2020 11:32 UTC
      2 points
      Parent
      
      The stepwise inaction baseline with inaction rollouts already uses the same policy for $_{t}^{'}$ and rollouts, and yet it is not the inaction baseline.
      
      In this case, it is, because the agent $A$ will only do $\emptyset$ from then on, to zero out the subsequent penalties.
      
      Why not set $s_{t}^{'} = s_{t - 1}$ ?
      
      It messes up the comparison for rewards that fluctuate based on time, it doesn’t block subagent creation… and I’ve never seen it before, so I don’t know what it could do ^_^ Do you have a well-developed version of this?
      
      The last point I don’t understand at all.