Thane Ruthenis comments on Agentized LLMs will change the alignment landscape

Thane Ruthenis 9 Apr 2023 9:59 UTC
5 points
0
In my view, if something like Auto-GPT can work, its ability to work is probably not too sensitive to the exact implementation of the wrapper. If GPT-4 has the raw capability to orient itself to reality and navigate it, it should be able to do that with even bare-bones self-prompt/prompted self-reflection ability. Something like Auto-GPT should be more than enough. So the failure is suggestive, is evidence about this whole landscape of approaches.
I agree that it’s possible that more nuanced wrapper designs would work, but I don’t place much probability on that.
- Nanda Ale 9 Apr 2023 11:59 UTC
  15 points
  7
  Parent
  I’m not confident at all Auto-GPT could work at its goals, just that in narrower domains the specific system or arrangement of prompt interactions matters. To give a specific example, I goof around trying to get good longform D&D games out of ChatGPT. (Even GPT-2 fine-tuned on Crit Role transcripts, originally.) Some implementations just work way better than others.
  The trivial system is no system—just play D&D. Works great until it feels like the DM is the main character in Memento. The trivial next step, rolling context window. Conversation fills up, ask for summary, start a new conversation with the summary. Just that is a lot better. But you really feel loss of detail in the sudden jump, so why not make it continuous. A secretary GPT with one job, prune the DM GPT conversation text after every question and answer, always try to keep most important and most recent. Smoother than the summary system. Maybe the secretary can not just delete but keep some details instead, maybe use half its tokens for a permanent game-state. Then it can edit useful details in/out of the conversation history. Can the secretary write a text file for old conversations? Etc. etc.
  Maybe the difference is the user plays the D&D, so you know immediately when it’s not working well. It’s usually obvious in minutes. Auto-GPT is supposed to automatic. So they add features and just kind of hope the AI figures it out from there. They don’t get the immediate “this is not working at all” feedback. Like they added embeddings 5 days ago—it just prints the words “Permanent memory:” in the prompt, followed by giant blogs up to 2500 tokens of the most related text from Pinecone. Works great for chatbots answering a single question about technical documentation. Real easy to imagine how it could fall apart when does iteratively over longer time periods. I can’t imagine this would work for a D&D game, it might be worse than having no memory. My gut feeling is you pull in the 2500 most related tokens of content into your prompt and the system is overall more erratic. You get the wrong 2500 tokens, it overwhelms whatever the original prompt was, now what is your agent up to? Just checked now, it changed to “This reminds you of these events from your past:”. That might actually make it somewhat less likely to blow up. Basically making the context of the text more clear: “These are old events and thoughts, and you are reminded of them, don’t take this text too seriously, this text might not even be relevant so maybe you should even ignore it. It’s just some stuff that came to mind, that’s how memories work sometimes.”
- Vladimir_Nesov 9 Apr 2023 14:05 UTC
  8 points
  4
  Parent
  
  If GPT-4 has the raw capability to orient itself to reality and navigate it, it should be able to do that with even bare-bones self-prompt/prompted self-reflection ability.
  
  GPT-4 by itself can’t learn, can’t improve its intuitions and skills in response to new facts of the situations of its instances (that don’t fit in its context). So the details of how the prosthetics that compensate for that are implemented (or guided in the way they are to be implemented) can well be crucial.
  
  And also, at some point there will be open sourced pre-trained and RLAIFed models of sufficient scale that allow fine-tuning, that can improve their intuitions, at which point running them inside an improved Auto-GPT successor might be more effective than starting the process from scratch, lowering the minimum necessary scale of the pre-trained foundational model.
  
  Which increases the chances that first AGIs are less intelligent than they would need to be otherwise. Which is bad for their ability to do better than humans at not building intentionally misaligned AGIs the first chance they get.
- Roger Dearnaley 22 Apr 2023 4:14 UTC
  3 points
  2
  Parent
  It’s also quite likely that something like Auto-GPT would work a lot better using a version of LLM that had been fine-tuned/reinforcement-trained for this specific usecase—just as Chat-GPT is a lot more effective as a chatbot than the underlying GPT-3 model was before the specialized training. If the LLM is optimized for the wrapper and the wrapper designed to make efficient use of the entire context-size of the LLM, thinks are going to work a lot better.
  - RogerDearnaley 5 Dec 2023 9:09 UTC
    1 point
    0
    Parent
    7 months later, we now know that this is true. Also, we now know that you can take output from a prompted/scaffolded LLM and use it to fine-tune another LLM to do the same things without needing prompt/scaffold.
    - RohanS 25 Jul 2024 8:40 UTC
      2 points
      0
      Parent
      Could you please point out the work you have in mind here?