Rohin Shah comments on rohinmshah’s Shortform

Rohin Shah 23 Oct 2020 23:38 UTC
2 points
An argument form that I like:
You claim X. Surely assumption Y would not make it less likely that X holds under your argument for X. But from Y I can conclude not X.
I think this should be convincing even if Y is false, unless you can explain why your argument for X does not work under assumption Y.
An example: any AI safety story (X) should also work if you assume that the AI does not have the ability to take over the world during training (Y).
- Matt Goldenberg 24 Oct 2020 4:57 UTC
  2 points
  Parent
  Trying to follow this. Doesn’t the Y (AI not taking over the world during training) make it less likely that X(AI will take over the world at all)?
  
  Which seems to contradict the argument structure. Perhaps you can give a few more examples to make more clear the structure?
  - Rohin Shah 24 Oct 2020 16:19 UTC
    4 points
    Parent
    In that example, X is “AI will not take over the world”, so Y makes X more likely. So if someone comes to me and says “If we use <technique>, then AI will be safe”, I might respond, “well, if we were using your technique, and we assume that AI does not have the ability to take over the world during training, it seems like the AI might still take over the world at deployment because <reason>”.
    I don’t think this is a great example, it just happens to be the one I was using at the time, and I wanted to write it down. I’m explicitly trying for this to be a low-effort thing, so I’m not going to try to write more examples now.
    EDIT: Actually, the double descent comment below has a similar structure, where X = “double descent occurs because we first fix bad errors and then regularize”, and Y = “we’re using an MLP / CNN with relu activations and vanilla gradient descent”.
    In fact, the AUP power comment does this too, where X = “we can penalize power by penalizing the ability to gain reward”, and Y = “the environment is deterministic, has a true noop action, and has a state-based reward”.
    Maybe another way to say this is:
    I endorse applying the “X proves too much” argument even to impossible scenarios, as long as the assumptions underlying the impossible scenarios have nothing to do with X. (Note this is not the case in formal logic, where if you start with an impossible scenario you can prove anything, and so you can never apply an “X proves too much” argument to an impossible scenario.)