I am very confused now what you believe. Obviously training selects for low loss algorithms… that’s, the whole point of training? I thought you were saying that training doesn’t select for algorithms that internally optimize for loss, which is true, but it definitely does select for algorithms that in fact get low loss.
The point of training in a practical sense is generally to produce networks with desirable behavior. The point of training in a dynamical sense is to perform an optimizer-mediated update to locally reduce loss along the locally steepest direction, aggregating gradients over different subsets of the data.
What is the empirical content of the claim that “training selects for low loss algorithms”? Can you make it more precise, perhaps by tabooing “selects for”?
I place here prediction that TurnTrout is trying to say that while, counterfactally, if we had algorithm that reasons about training, it would achieve low loss, it’s not obviously true that such algorithms are actually “achievable” for SGD in some “natural” setting.
Wait, where? I think the objection to “Doing that is quite hard” is not an objection to “it’s not obviously true that such algorithms are actually “achievable” for SGD”—it’s an objection to the conclusion that model would try hard enough to justify arguments about deception from weak statement about loss decreasing during training.
an objection to the conclusion that model would try hard enough to justify arguments about deception from weak statement about loss decreasing during training.
I am very confused now what you believe. Obviously training selects for low loss algorithms… that’s, the whole point of training? I thought you were saying that training doesn’t select for algorithms that internally optimize for loss, which is true, but it definitely does select for algorithms that in fact get low loss.
The point of training in a practical sense is generally to produce networks with desirable behavior. The point of training in a dynamical sense is to perform an optimizer-mediated update to locally reduce loss along the locally steepest direction, aggregating gradients over different subsets of the data.
What is the empirical content of the claim that “training selects for low loss algorithms”? Can you make it more precise, perhaps by tabooing “selects for”?
I place here prediction that TurnTrout is trying to say that while, counterfactally, if we had algorithm that reasons about training, it would achieve low loss, it’s not obviously true that such algorithms are actually “achievable” for SGD in some “natural” setting.
That’s what I thought he was saying previously, but he objected to that characterization in his most recent comment.
Wait, where? I think the objection to “Doing that is quite hard” is not an objection to “it’s not obviously true that such algorithms are actually “achievable” for SGD”—it’s an objection to the conclusion that model would try hard enough to justify arguments about deception from weak statement about loss decreasing during training.
This is… roughly one point I was making, yes.