Rohin Shah comments on Arguments against myopic training

Rohin Shah 16 Jul 2020 23:38 UTC
LW: 14 AF: 10
AF
However, it’s worth noting that the procedure given here really looks a lot more like approval-based amplification rather than imitative amplification.
Many algorithms for imitation still involve non-myopic training (e.g. GAIL, sorta, and AIRL).
I think this is where I disagree with this argument. I think you can get myopic agents which are competitive on long-run tasks because they are trying to do something like “be as close to HCH as possible” which results in good long-run task performance without actually being specified in terms of the long-term consequences of the agent’s actions.
… Why isn’t this compatible with saying that the supervisor (HCH) is “able to accurately predict how well their actions fulfil long-term goals”? Like, HCH presumably takes those actions because it thinks those actions are good for long-term goals.
What links here?
- abramdemski's comment on Arguments against myopic training by Richard_Ngo (21 Jul 2020 21:24 UTC; 14 points)
- evhub 21 Jul 2020 23:29 UTC
  LW: 4 AF: 3
  AF Parent
  
  … Why isn’t this compatible with saying that the supervisor (HCH) is “able to accurately predict how well their actions fulfil long-term goals”? Like, HCH presumably takes those actions because it thinks those actions are good for long-term goals.
  
  In the imitative case, the overseer never makes a determination about how effective the model’s actions will be at achieving anything. Rather, the overseer is only trying to produce the best answer for itself, and the loss is determined via a distance metric. While the overseer might very well try to determine how effective it’s own actions will be at achieving long-term goals, it never evaluates how effective the model’s actions will be. I see this sort of trick as the heart of what makes the counterfactual oracle analogy work.
  - Rohin Shah 22 Jul 2020 1:33 UTC
    LW: 6 AF: 4
    AF Parent
    I don’t really understand what you’re saying here. A thing you might be saying:
    Imitative amplification doesn’t have to deal with the informed oversight problem, since it evaluates its own actions rather than the agent’s actions.
    If that is what you’re saying, I don’t see why this is relevant to whether or not we should use myopic training?
    (It’s possible I need to reread the counterfactual oracle analogy, though I did skim it right now and didn’t immediately see the relevance.)
    - evhub 22 Jul 2020 19:01 UTC
      LW: 4 AF: 2
      AF Parent
      My point here is that I think imitative amplification (if you believe it’s competitive) is a counter-example to Richard’s argument in his “Myopic training doesn’t prevent manipulation of supervisors” section since any manipulative actions that an imitative amplification model takes aren’t judged by their consequences but rather just by how closely they match up with what the overseer would do.
      - Rohin Shah 22 Jul 2020 19:41 UTC
        LW: 2 AF: 2
        AF Parent
        That seems to be a property of myopic cognition rather than myopic training? (See also this comment.)
    - Richard_Ngo 22 Jul 2020 7:45 UTC
      LW: 2 AF: 1
      AF Parent
      I’m also confused.
      
      “While the overseer might very well try to determine how effective it’s own actions will be at achieving long-term goals, it never evaluates how effective the model’s actions will be.”
      
      Evan, do you agree that for the model to imitate the actions of the supervisor, it would be useful to mimic some of the thought processes the supervisor uses when generating those actions?
      
      In other words, if HCH is pursuing goal X, what feature of myopic training selects for a model that is internally thinking “I’m going to try to be as close to HCH as possible in this timestep, which involves reasoning about how HCH would pursue X”, versus a model that’s thinking “I’m going to pursue goal X”? (To the extent these are different, which I’m still confused about).