I personally dislike “converged” because it implies that the optimal policy is inevitable. If you reach that policy, then yes you have converged. However, the converse (“if you have not reached an optimal policy, then you have not converged”) is not true in general. Even in the supervised regime (with a stationary data distribution) you can have local minima or zero-determinant saddle points (i.e. flat regions in the loss landscape).
Mathematically, convergence just means that the distance to some limit point goes to 0 in the limit. There’s no implication that the limit point has to be unique, or optimal. Eg. in the case of Newton fractals, there are multiple roots and the trajectory converges to one of the roots, but which one it converges to depends on the starting point of the trajectory. Once the weight updates become small enough, we should say the net has converged, regardless of whether it achieved the “optimal” loss or not.
If even “converged” is not good enough, I’m not sure what one could say instead. Probably the real problem in such cases is people being doofuses, and probably they will continue being doofuses no matter what word we force them to use.
You raise good points. I agree that the mathematical definition of convergence does not insinuate uniqueness or optimality, thanks for reminding me of that.
I personally dislike “converged” because it implies that the optimal policy is inevitable. If you reach that policy, then yes you have converged. However, the converse (“if you have not reached an optimal policy, then you have not converged”) is not true in general. Even in the supervised regime (with a stationary data distribution) you can have local minima or zero-determinant saddle points (i.e. flat regions in the loss landscape).
Mathematically, convergence just means that the distance to some limit point goes to 0 in the limit. There’s no implication that the limit point has to be unique, or optimal. Eg. in the case of Newton fractals, there are multiple roots and the trajectory converges to one of the roots, but which one it converges to depends on the starting point of the trajectory. Once the weight updates become small enough, we should say the net has converged, regardless of whether it achieved the “optimal” loss or not.
If even “converged” is not good enough, I’m not sure what one could say instead. Probably the real problem in such cases is people being doofuses, and probably they will continue being doofuses no matter what word we force them to use.
You raise good points. I agree that the mathematical definition of convergence does not insinuate uniqueness or optimality, thanks for reminding me of that.
Adding to this: You will also have a range of different policies which your model alternates between.