We know that SGD is selecting for models based on some combination of loss and inductive biases, but we don’t know the exact tradeoff.
Actually, I’ve thought more, and I don’t think that this dual-optimization perspective makes it better. I deny that we “know” that SGD is selecting on that combination, in the sense which seems to be required for your arguments to go through.
It sounds to me like I said “here’s why you can’t think about ‘what gets low loss’” and then you said[1] “but what if I also think about certain inductive biases too?” and then you also said “we know that it’s OK to think about it this way.” No, I contend that we don’t know that. That was a big part of what I was critiquing.
As an alert—It feels like your response here isn’t engaging with the points I raised in my original comment. I expect I talked past you and you, accordingly, haven’t seen the thing I’m trying to point at.
Actually, I’ve thought more, and I don’t think that this dual-optimization perspective makes it better. I deny that we “know” that SGD is selecting on that combination, in the sense which seems to be required for your arguments to go through.
It sounds to me like I said “here’s why you can’t think about ‘what gets low loss’” and then you said[1] “but what if I also think about certain inductive biases too?” and then you also said “we know that it’s OK to think about it this way.” No, I contend that we don’t know that. That was a big part of what I was critiquing.
As an alert—It feels like your response here isn’t engaging with the points I raised in my original comment. I expect I talked past you and you, accordingly, haven’t seen the thing I’m trying to point at.
this isn’t a quote, this is just how your comment parsed to me