TurnTrout comments on Dreams of AI alignment: The danger of suggestive names

TurnTrout 12 Feb 2024 18:59 UTC
2 points
0
I personally dislike “converged” because it implies that the optimal policy is inevitable. If you reach that policy, then yes you have converged. However, the converse (“if you have not reached an optimal policy, then you have not converged”) is not true in general. Even in the supervised regime (with a stationary data distribution) you can have local minima or zero-determinant saddle points (i.e. flat regions in the loss landscape).
- DaemonicSigil 13 Feb 2024 7:17 UTC
  4 points
  2
  Parent
  Mathematically, convergence just means that the distance to some limit point goes to 0 in the limit. There’s no implication that the limit point has to be unique, or optimal. Eg. in the case of Newton fractals, there are multiple roots and the trajectory converges to one of the roots, but which one it converges to depends on the starting point of the trajectory. Once the weight updates become small enough, we should say the net has converged, regardless of whether it achieved the “optimal” loss or not.
  
  If even “converged” is not good enough, I’m not sure what one could say instead. Probably the real problem in such cases is people being doofuses, and probably they will continue being doofuses no matter what word we force them to use.
  - TurnTrout 19 Feb 2024 19:29 UTC
    4 points
    0
    Parent
    You raise good points. I agree that the mathematical definition of convergence does not insinuate uniqueness or optimality, thanks for reminding me of that.
- Garrett Baker 13 Feb 2024 4:22 UTC
  2 points
  0
  Parent
  Adding to this: You will also have a range of different policies which your model alternates between.