Wei Dai comments on Can corrigibility be learned safely?

Wei Dai 3 May 2018 20:49 UTC
LW: 4 AF: 1
AF

The constraint on the amplification process is that learning the full set of subtasks can’t be that much harder than simply learning the task.

I propose the following as an example of a task where learning the full set of subtasks is much harder than simply learning the task. Suppose we’re trying to predict quantum mechanical systems, specifically we’re given a molecule and asked to predict some property of it.

How would this work with amplification? If I’m not misunderstanding something, assuming the overseer knows QM, one of the subtasks would be to do a QM simulation (via meta-execution), and that seems much harder for ML to learn than just predicting a specific property. If the overseer does not know QM, one of the subtasks would have to be to do science and invent QM, which seems even harder to learn.

This seems to show that H can’t always produce a transcript for A to do imitation learning or inverse reinforcement learning from, so the only option left for the distillation process is direct supervision?
- paulfchristiano 5 May 2018 19:30 UTC
  LW: 6 AF: 3
  AF Parent
  You don’t have to do QM to make predictions about the particle. The goal is for IDA to find whatever structure allows the RL agent to make a prediction. (The exponential tree will solve the problem easily, but if we interleave distillation steps then many of those subtrees will get stuck because the agent isn’t able to learn to handle them.)
  In some cases this will involve opaque structures that happen to make good predictions. In that case, we need to make a safety argument about “heuristic without internal structure that happens to work.”
  - Wei Dai 6 May 2018 22:55 UTC
    LW: 2 AF: 1
    AF Parent
    You don’t have to do QM to make predictions about the particle. The goal is for IDA to find whatever structure allows the RL agent to make a prediction.
    My thought here is why try to find this structure inside meta-execution? It seems counterintuitive / inelegant that you have to worry about the safety of learned / opaque structures in meta-execution, and then again in the distillation step. Why don’t we let the overseer directly train some auxiliary ML models at each iteration of IDA, using whatever data the overseer can obtain (in this case empirical measurements of molecule properties) and whatever transparency / robustness methods the overseer wants to use, and then make those auxiliary models available to the overseer at the next iteration?
    What links here?
    A general model of safety-oriented AI development by Wei Dai (11 Jun 2018 21:00 UTC; 65 points)
    - paulfchristiano 7 May 2018 1:59 UTC
      LW: 2 AF: 1
      AF Parent
      It seems counterintuitive / inelegant that you have to worry about the safety of learned / opaque structures in meta-execution, and then again in the distillation step.
      I agree, I think it’s unlikely the final scheme will involve doing this work in two places.
      Why don’t we let the overseer directly train some auxiliary ML models at each iteration of IDA, using whatever data the overseer can obtain (in this case empirical measurements of molecule properties) and whatever transparency / robustness methods the overseer wants to use, and then make those auxiliary models available to the overseer at the next iteration?
      This a way that things could end up looking. I think there are more natural ways to do this integration though.
      Note that in order for any of this to work, amplification probably needs to be able to replicate/verify all (or most) of the cognitive work the ML model does implicitly, so that we can do informed oversight. There w opaque heuristics that “just work,” which are discovered either by ML or metaexecution trial-and-error, but then we need to confirm safety for those heuristics.