Jon Garcia comments on Four usages of “loss” in AI

Jon Garcia 2 Oct 2022 19:09 UTC
LW: 2 AF: 2
0
AF
With humans in the loop, there actually is a way to implement $ℓ_{n o v e l}$ . Unfortunately, computing the function takes as long as it takes for several humans to read a novel and aggregate their scores. And there’s also no way to compute the gradient. So by that point, it’s pretty much just a reinforcement learning signal.

However, you could use that human feedback to train a side network to predict the reward signal based on what the AI generates. This second network would then essentially compute a custom loss function (asymptotically approaching $ℓ_{n o v e l}$ with more human feedback) that is amenable to gradient descent and can run far more quickly. That’s basically the idea behind reward modeling (https://youtube.com/watch?v=PYylPRX6z4Q).

But yeah, framing such goals as loss functions probably gives the wrong intuition for how to approach aligning with them.
- Alex Flint 3 Oct 2022 17:22 UTC
  LW: 3 AF: 2
  1
  AF Parent
  Interesting. I have the sense that we would have to get humans to reflect for years after reading a novel to produce a rating that, if optimized, would produce truly great novels. I think that when a novel really moves a person (or, even more importantly, moves a whole culture), it’s not at all evident that this has happened until (often) years after-the-fact.
  
  I also have the sense that part of what makes a novel great is that a person or a culture decide to associate a certain beautiful insight with it due to the novel’s role in provoking that insight. But usually the novel is only partly responsible for the insight, and in part we choose to make the novel beautiful by associating it in our culture with a beautiful thing (and this associating of beautiful things is a good and honest thing to do).
  - Jon Garcia 3 Oct 2022 19:38 UTC
    LW: 2 AF: 2
    1
    AF Parent
    Well, then computing $ℓ_{n o v e l}$ would just take a really long time.
    So, it’s not impossible in principle if you trained the loss function as I suggested (loss function trained by reinforcement learning, then applied to train the actual novel-generating model), but it is a totally impractical approach.
    If you really wanted to teach an AI to generate good novels, you’d probably start by training a LLM to imitate existing novels through some sort of predictive loss (e.g., categorical cross-entropy on next-token prediction) to give it a good prior. Then train another LLM to predict reader reviews or dissertations written by literary grad students, using the novels they’re based on as inputs, again with a similar predictive loss. (Pretraining both LLMs on some large corpus (as with GPT) could probably help with providing necessary cultural context.) At the same time, use a Mechanical Turk to get thousands of people to rate the sentiment of every review/dissertation, then train another LLM to predict the sentiment scores of all raters (or a low-dimensional projection of all their ratings), using the reviews/dissertations as input and something like MSE loss to predict sentiment scores as output. Then chain these latter two networks together to compute $ℓ_{n o v e l}$ , to act as the prune to the first network’s babble, and train to convergence.
    Honestly, though, I probably still wouldn’t trust the resulting system to produce good novels (or at least not with internally consistent plots, characterizations, and themes) if the LLMs were based on a Transformer architecture.
    - Alex Flint 3 Oct 2022 21:28 UTC
      LW: 2 AF: 1
      0
      AF Parent
      
      Honestly, though, I probably still wouldn’t trust the resulting system [...] if the LLMs were based on a Transformer architecture.
      
      Interesting—why is that?
      - Jon Garcia 13 Oct 2022 4:57 UTC
        1 point
        0
        Parent
        Mostly due to the limited working memory that Transformers typically use (e.g., a buffer of only the most recent 512 tokens feeding into the decoder). When humans write novels, they have to keep track of plot points, character sheets, thematic arcs, etc. across tens of thousands of words. You could probably get it to work, though, if you augmented the LLM with content-addressable memory and included positional encoding that is aware of where in the novel (percentage-wise) each token resides.
  - Noosphere89 3 Oct 2022 18:11 UTC
    1 point
    2
    Parent
    I think you cut yourself off in the first paragraph.