Rohin Shah comments on rohinmshah’s Shortform

Rohin Shah 24 Oct 2020 16:19 UTC
4 points
In that example, X is “AI will not take over the world”, so Y makes X more likely. So if someone comes to me and says “If we use <technique>, then AI will be safe”, I might respond, “well, if we were using your technique, and we assume that AI does not have the ability to take over the world during training, it seems like the AI might still take over the world at deployment because <reason>”.
I don’t think this is a great example, it just happens to be the one I was using at the time, and I wanted to write it down. I’m explicitly trying for this to be a low-effort thing, so I’m not going to try to write more examples now.
EDIT: Actually, the double descent comment below has a similar structure, where X = “double descent occurs because we first fix bad errors and then regularize”, and Y = “we’re using an MLP / CNN with relu activations and vanilla gradient descent”.
In fact, the AUP power comment does this too, where X = “we can penalize power by penalizing the ability to gain reward”, and Y = “the environment is deterministic, has a true noop action, and has a state-based reward”.
Maybe another way to say this is:
I endorse applying the “X proves too much” argument even to impossible scenarios, as long as the assumptions underlying the impossible scenarios have nothing to do with X. (Note this is not the case in formal logic, where if you start with an impossible scenario you can prove anything, and so you can never apply an “X proves too much” argument to an impossible scenario.)