Seth Herd comments on Aligning an H-JEPA agent via training on the outputs of an LLM-based “exemplary actor”

Seth Herd May 29, 2023, 6:24 PM
5 points
0
Given your summary in the other comment:
H-JEPA probably won’t save us unless we already have an aligned LLM-based cognitive architecture.
My question is: if we already have an aligned LMCA, why would we use it to train a less interpretable H-JEPA AGI? Including this in the top level summary would be really useful for explaining the potential significance of your proposal.
To me, the whole notion that an H-JEPA architecture would save us is highly counterintuitive. As Steve Byrnes explained in detail, LeCun’s brainlike H-JEPA AGI architecture proposal really doesn’t contain an alignment proposal. Sure, it could be aligned if you got the rules in its decision component just right. But that’s just restating the whole alignment problem, which of course LeCun merrily assumes is actually easy—apparently without bothering to think about it.
So perhaps the significance here is: if LeCun’s proposed architecture turns out to work well, we’d want a way to align it, so here’s a proposal for doing it.
- Roman Leventov May 29, 2023, 6:40 PM
  3 points
  0
  Parent
  My question is: if we already have an aligned LMCA, why would we use it to train a less interpretable H-JEPA AGI?
  First, it is not less interpretable. Here, Bengio and Hu argue that GFlowNets are more interpretable than auto-regressive LLMs; but in the setup where the energy function is not explicitly given (as in some other GFlowNet training setups, e.g., for drug discovery) but rather learned from examples (as I proposed in the post), GFlowNet don’t have any interpretability advantage over the AI that generate the examples, which is the aligned LMCA in this case. Hence, we should consider aligned LMCA and H-JEPA agent with GFlowNet actors to have “the same level” of decent interpretability.
  Second, answering to the question: well, actually it appears to me that there is not much reason (although it seemed to me more like there would be some significant advantages to it when I was starting to write this article). I’ve summarised this result in these two paragraphs of the “Conclusion” section:
  The only aspect in which the x-risk profile of H-JEPA agent with GFlowNet actors seems to be qualitatively different from that of the exemplary actor is the risk of direct access to the underlying Transformers, which is catastrophic in the case of the exemplary actor (section 2.3.1) and perhaps could be addressed completely in GFlowNet actors if we accept that they will deliberately dumb themselves in strategic x-risk analysis and planning (section 4.3). However, even if we accept this tradeoff, this might not reduce the overall x-risk of the civilisation because GFlowNet actors are not “self-sufficient” for training (section 4.4) and therefore the powerful LLM that underlies the exemplary actor that is used to train GFlowNet actors should still be kept around, and the risk of direct access to this LLM is still present.
  Thus, H-JEPA agent with GFlowNet actors could become really interesting perhaps only if the “LLM optimism” view will prove to be correct and thus LMCAs could generally work and be satisfactorily aligned, but also sensory grounding proves to be a really important missing piece of the puzzle. (Though, this combination of conditionals looks to me rather unlikely.) The proposed variant of H-JEPA combines “the best of two worlds”: grounding from H-JEPA and aligned reasoning from the LMCA.
  - Seth Herd May 29, 2023, 6:47 PM
    2 points
    0
    Parent
    I thought your response would be that the H-JEPA network might be substantially faster, and so have a lower alignment tax than the exemplary LMCA.
    
    LMCAs are much more interpretable than the base LLMs, because you’re deliberately breaking their cognition into small pieces, each of which is summarized by a natural language utterence. They’re particularly reliably interpretable if you call new instances of LLMs for each piece to prevent Waluigi collapse effects, something I hadn’t thought of in that first post.
    
    Because they can access sensory networks as external tools, LMCAs already have access to sensory grounding (although they haven’t been coded to use it particularly well in currently-published work). A more direct integration of sensory knowledge might prove critical, or at least faster.
    - Roman Leventov May 29, 2023, 7:07 PM
      1 point
      0
      Parent
      I thought your response would be that the H-JEPA network might be substantially faster, and so have a lower alignment tax than the exemplary LMCA.
      I discuss this in section 4.5. My intuition is that LMCA with latency in tens of minutes is basically as “powerful” (on the civilisational scale) as an agent with latency of one second, there is no OODA-style edge in being swifter than tens of minutes. So, I think that Eric Schmidt’s idea of “millisecond-long war” (or, a war where action unfolds at millisecond-scale cadence) just doesn’t make sense.
      However, these are just my intuitions. They may be wrong, and flash attacks might be possible. In this case, GFlowNets could be useful because they could work much faster than LMCA, indeed^[1].
      ^
      Of course, if training such GFlowNets will even become tractable, which is not guaranteed. I discuss the potential of this training to be orders of magnitude more expensive than training of even GPT-5/6 level LLMs in section 3.4.
      - Seth Herd May 29, 2023, 7:15 PM
        1 point
        0
        Parent
        I agree that faster isn’t a clear win for most real-world scenarios. But it is more powerful, because you can have that agent propose many plans and consider more scenarios in the same time. It’s also probably linked to being much more cost-efficient, in compute and money. But I’m not sure about the last one.
        Roman Leventov May 29, 2023, 7:28 PM
        2 points
        0
        Parent
        LMCA that uses a body of knowledge in the form of textbooks, scientific theories and models may be updated very frequently and cheaply: essentially, every update of the scientific textbook is an update of LMCA. No need to re-train anything.
        GFlowNets have a disadvantage because they are trained for a very particular version of the exemplary actor, drawing upon a particular version of the body of knowledge. And this training will be extremely costly (billions or tens or even hundreds of billions of USD?) and high-latency (months?). By the time a hypothetical training of GFlowNet is complete, the textbooks and models may be already outdated.
        This consideration really challenges the economic and capability expediency of GFlowNet actors (as described in the post) vs. LMCA. The flexibility, deployability, configurability, and iterability of LMCA may prove to be a too strong factor.