Agreed, there can be a optimum. But I think the intuition here is that it is exceedingly rare enough to run into a situation where it is local optima in all “directions”.
It is only an “optimum” when all 175 billion parameters are telling you to screw off and stop trying.
Agreed, there can be a optimum. But I think the intuition here is that it is exceedingly rare enough to run into a situation where it is local optima in all “directions”.
It is only an “optimum” when all 175 billion parameters are telling you to screw off and stop trying.