Steven Byrnes comments on Why GPT wants to mesa-optimize & how we might change this

Steven Byrnes 19 Sep 2020 16:09 UTC
LW: 9 AF: 4
AF
In this instance, GPT has an incentive to do internal lookahead. But it’s unclear how frequently these situations actually arise
I’m going with “very frequently, perhaps universally”. An example I came up with here was choosing “a” vs “an” which depends on the next word.
I think writing many, maybe most, sentences, requires some idea of how the sentence structure is going to be laid out, and that “idea” extends beyond the next token. Ditto at the paragraph level etc.
So I think it already does lookahead in effect, but I don’t think it does it by “beam search” per se. I think it’s more like “using concepts that extend over many tokens”, concepts like “this sentence has the following overall cadence...” and “this sentence conveys the following overall idea...” and “we’re in the middle of writing out this particular idiomatic phrase”. The training simultaneously incentives both finding the right extended concepts for where you’re at in the text, and choosing a good word in light of that context.
What links here?
- wkey22 27 Nov 2020 9:10 UTC
  LW: 10 AF: 5
  AF Parent
  I used your idea of “a” vs. “an” as the basis of a GPT-3 experiment to show that GPT-3 indeed probably does do lookahead. Details are at https://www.reddit.com/r/GPT3/comments/k0mvf3/experiment_that_shows_that_gpt3_can_probably_plan/
  - John_Maxwell 28 Nov 2020 6:11 UTC
    LW: 2 AF: 1
    AF Parent
    Thanks for sharing!
    - wkey222 28 Nov 2020 13:15 UTC
      3 points
      Parent
      You’re welcome, and thank you for your post also :). I posted an updated version of my experiment, which (hopefully) improves the logic of my prior experiment, at https://www.reddit.com/r/MachineLearning/comments/k2n3yv/d_an_experiment_that_shows_that_gpt3_can_plan/.
- John_Maxwell 19 Sep 2020 22:09 UTC
  LW: 7 AF: 5
  AF Parent
  This post distinguishes between mesa-optimization and learned heuristics. What you’re describing sounds like learned heuristics. (“Learning which words are easy to rhyme” was an example I gave in the post.) Learned heuristics aren’t nearly as worrisome as mesa-optimization because they’re harder to modify and misuse to do planning in unexpected domains. When I say “lookahead” in the post I’m pretty much always referring to the mesa-optimization sort.
  - Steven Byrnes 20 Sep 2020 0:24 UTC
    LW: 5 AF: 3
    AF Parent
    Suppose I said (and I actually believe something like this is true):
    “GPT often considers multiple possibilities in parallel for where the text is heading—including both where it’s heading in the short-term (is this sentence going to end with a prepositional phrase or is it going to turn into a question?) and where it’s heading in the long-term (will the story have a happy ending or a sad ending?)—and it calculates which of those possibilities are most likely in light of the text so far. It chooses the most likely next word in light of this larger context it figured out about where the text is heading.”
    If that’s correct, would you call GPT a mesa-optimizer?
    - John_Maxwell 20 Sep 2020 0:40 UTC
      LW: 8 AF: 5
      AF Parent
      Well I suppose mesa-optimization isn’t really a binary is it? Like, maybe there’s a trivial sense in which self-attention “mesa-optimizes” over its input when figuring out what to pay attention to.
      
      But ultimately, what matters isn’t the definition of the term “mesa-optimization”, it’s the risk of spontaneous internal planning/optimization that generalizes in unexpected ways or operates in unexpected domains. At least in my mind. So the question is whether this considering multiple possibilities about text stuff could also improve its ability to consider multiple possibilities in other domains. Which depends on whether the implementation of “considering multiple possibilities” looks more like beam search vs very domain-adapted heuristics.
      - Steven Byrnes 20 Sep 2020 2:26 UTC
        LW: 6 AF: 3
        AF Parent
        I think the Transformer is successful in part because it tends to solve problems by considering multiple possibilities, processing them in parallel, and picking the one that looks best. (Selection-type optimization.) If you train it on text prediction, that’s part of how it will do text prediction. If you train it on a different domain, that’s part of how it will solve problems in that domain too.
        I don’t think GPT builds a “mesa-optimization infrastructure” and then applies that infrastructure to language modeling. I don’t think it needs to. I think the Transformer architecture is already raring to go forth and mesa-optimize, as soon as you as you give it any optimization pressure to do so.
        So anyway your question is: can it display foresight / planning in a different domain via without being trained in that domain? I would say, “yeah probably, because practically every domain is instrumentally useful for text prediction”. So somewhere in GPT-3′s billions of parameters I think there’s code to consider multiple possibilities, process them in parallel, and pick the best answer, in response to the question of What will happen next when you put a sock in a blender? or What is the best way to fix an oil leak?—not just those literal words as a question, but the concepts behind them, however they’re invoked.
        (Having said that, I don’t think GPT-3 specifically will do side-channel attacks, but for other unrelated reasons off-topic. Namely, I don’t think it is capable of make the series of new insights required to develop an understanding of itself and its situation and then take appropriate actions. That’s based on my speculations here.)