Steven Byrnes comments on LeCun’s “A Path Towards Autonomous Machine Intelligence” has an unsolved technical alignment problem

Steven Byrnes 9 May 2023 2:05 UTC
LW: 2 AF: 2
0
AF
If I were trying to make this model work, I’d use mainly self-supervised learning that’s aimed at getting the module to predict what a typical human would feel.
I don’t follow. Can you explain in more detail? “Self-supervised learning” means training a model to predict some function / subset of the input data from a different function / subset of the input data, right? What’s the input data here, and what is the prediction target?
- PeterMcCluskey 9 May 2023 21:14 UTC
  LW: 2 AF: 1
  0
  AF Parent
  I haven’t thought this out very carefully. I’m imagining a transformer trained both to predict text, and to predict the next frame of video.
  
  Train it on all available videos that show realistic human body language.
  
  Then ask the transformer to rate on a numeric scale how positively or negatively a human would feel in any particular situation.
  
  This does not seem sufficient for a safe result, but implies that LeCun is less nutty than your model of him suggests.
  - Steven Byrnes 9 May 2023 22:52 UTC
    LW: 2 AF: 2
    0
    AF Parent
    
    Then ask the transformer to rate on a numeric scale how positively or negatively a human would feel in any particular situation.
    
    I’m still confused. Here you’re describing what you’re hoping will happen at inference time. I’m asking how it’s trained, such that that happens. If you have a next-frame video predictor, you can’t ask it how a human would feel. You can’t ask it anything at all—except “what might be the next frame of thus-and-such video?”. Right?
    
    I wonder if you’ve gotten thrown off by chatGPT etc. Those are NOT trained by SSL, and therefore NOT indicative of how SSL-trained models behave. They’re pre-trained by SSL, but then they’re fine-tuned by supervised learning, RLHF, etc. The grizzled old LLM people will tell you about the behavior of pure-SSL models, which everyone used before like a year ago. They’re quite different. You cannot just ask them a question and expect them to spit out an answer. You have to prompt them in more elaborate ways.
    
    (On a different topic, self-supervised pre-training before supervised fine-tuning is almost always better than supervised learning from random initialization, as far as I understand. Presumably if someone were following the OP protocol, which involves a supervised learning step, then they would follow all the modern best practices for supervised learning, and “start from a self-supervised-pretrained model” is part of those best practices.)
    - red75prime 10 May 2023 18:30 UTC
      LW: 3 AF: 2
      0
      AF Parent
      If you have a next-frame video predictor, you can’t ask it how a human would feel. You can’t ask it anything at all—except “what might be the next frame of thus-and-such video?”. Right?
      Not exactly. You can extract embeddings from a video predictor (activations of the next-to-last layer may do, or you can use techniques, which enhance semantic information captured in the embeddings). And then use supervised learning to train a simple classifier from an embedding to human feelings on a modest number of video/feelings pairs.
      - Steven Byrnes 10 May 2023 18:55 UTC
        LW: 2 AF: 2
        0
        AF Parent
        I think that’s what I said in the last paragraph of the comment you’re responding to:
        (On a different topic, self-supervised pre-training before supervised fine-tuning is almost always better than supervised learning from random initialization, as far as I understand. Presumably if someone were following the OP protocol, which involves a supervised learning step, then they would follow all the modern best practices for supervised learning, and “start from a self-supervised-pretrained model” is part of those best practices.)
        Maybe that’s what PeterMcCluskey was asking about this whole time—I found his comments upthread to be pretty confusing. But anyway, if that’s what we’ve been talking about all along, then yeah, sure. I don’t think my OP implied that we would do supervised learning from random initialization. I just said “use supervised learning to train an ML model”. I was assuming that people would follow all the best practices for supervised learning—self-supervised pretraining, data augmentation, you name it. This is all well-known stuff—this step is not where the hard unsolved technical problems are. I’m open to changing the wording if you think the current version is unclear.