Ape in the coat comments on Internal independent review for language model agent alignment

Ape in the coat 14 Jul 2023 5:29 UTC
10 points
8
Honestly the fact that we are not pouring literal hundreds of millions of dollars into this avenue of research is mind boggling for me. LMA alignment is tractable. What else do we need?
One of the extremely important point that I don’t think you’ve explicitly addressed, is that with LMA we do not even necessary have to get alignment exactly correct from the first try. We can separately test the “ethics module” of the LMA as much as we want and be confident in the results.
- Seth Herd 18 Jul 2023 23:17 UTC
  11 points
  0
  Parent
  Almost everyone has only been thinking about LMAs since AutoGPT made a splash, so I’m not surprised that we’re not already investing heavily.
  
  What I am surprised by is the relative lack of interest in the alignment community. Is everyone sure these can’t lead to AGI? Are they waiting to see how progress goes before thinking about aligning this sort of system? That doesn’t seem smart. Or does my writing just suck? :) (that is, has nobody yet written about this in compellingly enough to make the importance obvious to the broader community)?
- Roman Leventov 14 Jul 2023 6:20 UTC
  4 points
  1
  Parent
  Large labs (OpenAI and Anthropic, at least) are pouring at least tens of millions of dollars into this avenue of research, and are close to optimal type of organisations to do it, too. True, they are “stained” by competitiveness pressures, but recreating some necessary conditions in academia or other labs is hard: you need significant investment to get it going (“high activation energy”) to attract experts, develop the platform, secure and curate training data, including expensive human labels and evaluations, etc. Some labs are trying, though, e.g., Conjecture’s CoEm agenda might be “LMA alignment in disguise” (although we cannot know for sure because we don’t know any details).
  - Brendon_Wong 18 Jul 2023 21:47 UTC
    3 points
    2
    Parent
    Do you have a source for “Large labs (OpenAI and Anthropic, at least) are pouring at least tens of millions of dollars into this avenue of research?” I think a lot of the current work pertains to LMA alignment, like RLHF, but isn’t LMA alignment per say (I’d make a distinction between aligning the black box models that compose the LMA versus the LMA itself).
    - Roman Leventov 19 Jul 2023 6:21 UTC
      1 point
      0
      Parent
      I implied the whole spectrum of “LLM alignment”, which I think is better to count as a single “avenue of research” because critiques and feedback in “LMA production time” could as well be applied during pre-training and fine-tuning phases of training (constitutional AI style). It’s only reasonable for large AGI labs to ban LMAs completely on top of their APIs (as Connor Leahy suggests), or research their safety themselves (as they already started to do, to a degree, with ARC’s evals of GPT-4, for instance).
      - Brendon_Wong 20 Jul 2023 1:24 UTC
        5 points
        2
        Parent
        I implied the whole spectrum of “LLM alignment”, which I think is better to count as a single “avenue of research” because critiques and feedback in “LMA production time” could as well be applied during pre-training and fine-tuning phases of training (constitutional AI style).
        If I’m understanding correctly, is your point here that you view LLM alignment and LMA alignment as the same? If so, this might be a matter of semantics, but I disagree; I feel like the distinction is similar to ensuring that the people that comprise the government is good (the LLMs in an LMA) versus trying to design a good governmental system itself (e.g. dictatorship, democracy, futarchy, separation of powers, etc.). The two areas are certainly related, and a failure in one can mean a failure in another, but the two areas can involve some very separate and non-associated considerations.
        It’s only reasonable for large AGI labs to ban LMAs completely on top of their APIs (as Connor Leahy suggests)
        Could you point me to where Connor Leahy suggests this? Is it in his podcast?
        or research their safety themselves (as they already started to do, to a degree, with ARC’s evals of GPT-4, for instance)
        To my understanding, the closest ARC Evals gets to LMA-related research is by equipping LLMs with tools to do tasks (similar to ChatGPT plugins), as specified here. I think one of the defining features of an LMA is self-delegation, which doesn’t appear to be happening here. The closest they might’ve gotten was a basic prompt chain.
        I’m mostly pointing these things out because I agree with Ape in the coat and Seth Herd. I don’t think there’s any actual LMA-specific work going on in this space (beyond some preliminary efforts, including my own), and I think there should be. I am pretty confident that LMA-specific work could be a very large research area, and many areas within it would not otherwise be covered with LLM-specific work.
        Roman Leventov 20 Jul 2023 5:34 UTC
        1 point
        0
        Parent
        I have no intention to argue this point to death. After all, it’s better to do “too much” LMA alignment research than “too little”. But I would definitely suggest reaching to AGI labs’ safety teams, maybe privately, and at least trying to find out where they are than just to assume that they don’t do LMA alignment.
        
        Connor Leahy proposed banning LLM-based agent’s here: https://twitter.com/NPCollapse/status/1678841535243202562. In the context of this proposal (which I agree with), a potentially high-leverage thing to work on now is a detection algorithm for LLM API usage patterns that indicate agent-like usage. Though, this may be difficult, if the users interleave calling OpenAI API with Anthropic API with local usage of LLaMA 2 in their LMA.
        
        However, if Meta, Eleuther AI, Stability, etc. won’t stop developing more and more powerful “open” LLMs, agents are inevitable, anyway.