Chris_Leong comments on Dreams of AI alignment: The danger of suggestive names

Chris_Leong 10 Feb 2024 4:42 UTC
5 points
1
You can replace “optimal” with “artifact equilibrated under policy update operations”

I don’t think most people can. If you don’t like the connotations of existing terms, I think you need to come up with new terms and they can’t be too verbose or people won’t use them.
One thing that makes these discussions tricky is that the apt-ness of these names likely depends on your object-level position. If you hold the AI optimist position, then you likely feel these names are biasing people towards and incorrect conclusion. If you hold the AI pessimist position, you likely see many of these connotations as actually a positive, in terms to pointing people towards useful metaphors, even if people occasionally slip-up and reify the terms.

Also, have you tried having a moderated conversation with someone who disagrees with you? Sometimes that can help resolve communication barriers.
- the gears to ascension 10 Feb 2024 7:41 UTC
  14 points
  8
  Parent
  
  I don’t think most people can. If you don’t like the connotations of existing terms, I think you need to come up with new terms and they can’t be too verbose or people won’t use them.
  
  I suspect that if they can’t ground it out to the word underneath, then there should be … some sort of way to make that concrete as a prediction that their model is drastically more fragile than their words make it sound. If you cannot translate your thinking into math fluently, then your thinking is probably not high enough quality yet, or so? And certainly I propose this test expecting myself to fail it plenty often enough.
  
  Also, have you tried having a moderated conversation with someone who disagrees with you? Sometimes that can help resolve communication barriers.
  
  @TurnTrout: I’d really, really like to see you have a discussion with someone with a similar level of education about deep learning who disagrees with you about the object level claims. If possible, I’d like it to be Bengio. I think the two of you discussing the mechanics of the problem at hand would yield extremely interesting insights. I expect the best format for it would be a series of emails back and forth, a lesswrong dialogue, or some other compatible asynchronous messaging format without outside observers until the discussion has progressed to a point where both participants feel it is ready to share. Potentially moderation could help, I expect it to be unnecessary.
  - Chris_Leong 11 Feb 2024 7:39 UTC
    2 points
    0
    Parent
    I’m not saying that people can’t ground it out. I’m saying that if you try to think or communicate using really verbose terms it’ll reduce your available working memory which will limit your ability to think new thoughts.
- TurnTrout 12 Feb 2024 18:55 UTC
  4 points
  0
  Parent
  I don’t think most people can. If you don’t like the connotations of existing terms, I think you need to come up with new terms and they can’t be too verbose or people won’t use them.
  Yes, I agree that this is an impractical phrase substitution for “optimal.” I meant to be listing “ways you can think about alignment more precisely” and then also “I wish we had better names for actual communication.” Maybe I should have made more explicit note of this earlier in the essay.
  EDIT: I now see that you seem to think this is also an impractical thought substitution. I disagree with that, but can’t speak for “most” people.
- DaemonicSigil 11 Feb 2024 20:30 UTC
  4 points
  2
  Parent
  On the actual object level for the word “optimal”, people already usually say “converged” for that meaning and I think that’s a good choice.
  - TurnTrout 12 Feb 2024 18:59 UTC
    2 points
    0
    Parent
    I personally dislike “converged” because it implies that the optimal policy is inevitable. If you reach that policy, then yes you have converged. However, the converse (“if you have not reached an optimal policy, then you have not converged”) is not true in general. Even in the supervised regime (with a stationary data distribution) you can have local minima or zero-determinant saddle points (i.e. flat regions in the loss landscape).
    - DaemonicSigil 13 Feb 2024 7:17 UTC
      4 points
      2
      Parent
      Mathematically, convergence just means that the distance to some limit point goes to 0 in the limit. There’s no implication that the limit point has to be unique, or optimal. Eg. in the case of Newton fractals, there are multiple roots and the trajectory converges to one of the roots, but which one it converges to depends on the starting point of the trajectory. Once the weight updates become small enough, we should say the net has converged, regardless of whether it achieved the “optimal” loss or not.
      
      If even “converged” is not good enough, I’m not sure what one could say instead. Probably the real problem in such cases is people being doofuses, and probably they will continue being doofuses no matter what word we force them to use.
      - TurnTrout 19 Feb 2024 19:29 UTC
        4 points
        0
        Parent
        You raise good points. I agree that the mathematical definition of convergence does not insinuate uniqueness or optimality, thanks for reminding me of that.
    - Garrett Baker 13 Feb 2024 4:22 UTC
      2 points
      0
      Parent
      Adding to this: You will also have a range of different policies which your model alternates between.