Seth Herd comments on Aligning an H-JEPA agent via training on the outputs of an LLM-based “exemplary actor”

Seth Herd May 29, 2023, 6:47 PM
2 points
0
I thought your response would be that the H-JEPA network might be substantially faster, and so have a lower alignment tax than the exemplary LMCA.

LMCAs are much more interpretable than the base LLMs, because you’re deliberately breaking their cognition into small pieces, each of which is summarized by a natural language utterence. They’re particularly reliably interpretable if you call new instances of LLMs for each piece to prevent Waluigi collapse effects, something I hadn’t thought of in that first post.

Because they can access sensory networks as external tools, LMCAs already have access to sensory grounding (although they haven’t been coded to use it particularly well in currently-published work). A more direct integration of sensory knowledge might prove critical, or at least faster.
- Roman Leventov May 29, 2023, 7:07 PM
  1 point
  0
  Parent
  I thought your response would be that the H-JEPA network might be substantially faster, and so have a lower alignment tax than the exemplary LMCA.
  I discuss this in section 4.5. My intuition is that LMCA with latency in tens of minutes is basically as “powerful” (on the civilisational scale) as an agent with latency of one second, there is no OODA-style edge in being swifter than tens of minutes. So, I think that Eric Schmidt’s idea of “millisecond-long war” (or, a war where action unfolds at millisecond-scale cadence) just doesn’t make sense.
  However, these are just my intuitions. They may be wrong, and flash attacks might be possible. In this case, GFlowNets could be useful because they could work much faster than LMCA, indeed^[1].
  1. ^
    Of course, if training such GFlowNets will even become tractable, which is not guaranteed. I discuss the potential of this training to be orders of magnitude more expensive than training of even GPT-5/6 level LLMs in section 3.4.
  - Seth Herd May 29, 2023, 7:15 PM
    1 point
    0
    Parent
    I agree that faster isn’t a clear win for most real-world scenarios. But it is more powerful, because you can have that agent propose many plans and consider more scenarios in the same time. It’s also probably linked to being much more cost-efficient, in compute and money. But I’m not sure about the last one.
    - Roman Leventov May 29, 2023, 7:28 PM
      2 points
      0
      Parent
      LMCA that uses a body of knowledge in the form of textbooks, scientific theories and models may be updated very frequently and cheaply: essentially, every update of the scientific textbook is an update of LMCA. No need to re-train anything.
      GFlowNets have a disadvantage because they are trained for a very particular version of the exemplary actor, drawing upon a particular version of the body of knowledge. And this training will be extremely costly (billions or tens or even hundreds of billions of USD?) and high-latency (months?). By the time a hypothetical training of GFlowNet is complete, the textbooks and models may be already outdated.
      This consideration really challenges the economic and capability expediency of GFlowNet actors (as described in the post) vs. LMCA. The flexibility, deployability, configurability, and iterability of LMCA may prove to be a too strong factor.