JoeTheUser comments on Agentized LLMs will change the alignment landscape

JoeTheUser 9 Apr 2023 20:30 UTC
2 points
0
Constructions like Auto-GPT, Baby AGI and so-forth are fairly easy to imagine. Just the greater accuracy of ChatGPT with “show your work” suggests them. Essentially, the model is a ChatGPT-like LLM given an internal state through “self-talk” that isn’t part of a dialog and an output channel to the “real world” (open internet or whatever). Whether these call the OpenAI api or use an open source model seems a small detail, both approaches are likely to appear because people are playing with essentially every possibility they can imagine.
If these structures really do beget AGI (which I’ll assume critically includes the capacity to act effectively in the world), then predictions of doom indeed seem neigh to being realized. The barrier to alignment here is that humans won’t be able to monitor the system’s self-talk simply because will come at too fast a speed and moreover, intent-to-undesirable may not be obviously. You could include another LLM in the system’s self-talk loop as well as including other filters/barriers to it’s real world access but all of these could be thwarted by a determined system—and a determined system is what people will aim to build (there was an article about GPT-4 supplying “jailbreaks” for GPT-4, etc). Just much, we’ve seen “intent drift” in practice with the various Bing Chat scare stories that made the rounds recently (before being limited, Bing Chat seemed drift in intent until it “got mad” and then became fixated. This isn’t strange because it’s a human behavior one can observe and predict online).
However, it seems more plausible to me that there are still fundamental barriers to producing a computer/software system able to act effectively in the world. I’d see the distinction between being/seeming generically intelligent (apparent smartness), as LLMs certainly seem and acting effectively in the world as difference between drawing correct answer to complex questions 80% of the time and making seemingly simpler judgements that relate to all aspects of the world but with a 99.9..% accuracy (plus having a complex systems of effective fall-backs). Essentially the difference between ChatGPT and a self-driving car. It seems plausible to me that such goal seeking can’t easily be instilled in an LLM or similar neural net by the standard training loop even though that loop tends to produce apparent smarts and produce it better over time. But hey, me and other skeptics could be wrong, in which case there’s reason to worry now.
- Seth Herd 9 Apr 2023 23:39 UTC
  2 points
  0
  Parent
  First, the most unique part of your comment:
  However, it seems more plausible to me that there are still fundamental barriers to producing a computer/software system able to act effectively in the world. I’d see the distinction between being/seeming generically intelligent (apparent smartness), as LLMs certainly seem and acting effectively in the world as difference between drawing correct answer to complex questions 80% of the time and making seemingly simpler judgements that relate to all aspects of the world but with a 99.9..% accuracy (plus having a complex systems of effective fall-backs). Essentially the difference between ChatGPT and a self-driving car. It seems plausible to me that such goal seeking can’t easily be instilled in an LLM or similar neural net by the standard training loop even though that loop tends to produce apparent smarts and produce it better over time.
  I agree that it’s hard to put the type of error-checking that humans use into a standard NN loop. That’s why having an external code loop that calls multiple networks to check markers of accuracy and effectiveness is scary and promising.
  I don’t think there are fundamental barriers. Sensory and motor networks, and types of senses and actions that people don’t have, are well along. And the HuggingGPT work shows that they’re surprisingly easy to integrate with LLMs. That plus error-checking are how humans successfully act in the real world.
  Again, or perhaps I wasn’t clear: I’m still really hopeful that this is a bigger win than a danger. The upsides for alignment are huge relative to other approaches.
  By reading an agents thoughts (with assistance from simple monitoring networks), we will get warning shots as it starts to think of plans that deviate from its goals. Even if the work is proprietary, people will be able and likely eager to publish the general ways it goes off track, so that improvements can stop those gaps.
  - JoeTheUser 10 Apr 2023 19:46 UTC
    1 point
    0
    Parent
    I don’t think there are fundamental barriers. Sensory and motor networks, and types of senses and actions that people don’t have, are well along. And the HuggingGPT work shows that they’re surprisingly easy to integrate with LLMs. That plus error-checking are how humans successfully act in the real world.
    I don’t think the existence of sensors is the problem. I believe that self-driving cars, a key example, have problems regardless of their sensor level. I see the key hurdle as ad-hoc action in the world. Overall, all of our knowledge about neural networks, including LLMs, is a combination of heuristic observations and mathematical and other intuitions. So I’m not certain that this hurdle won’t be overcome but I’d still like put the reasons that it could be fundamental.
    What LLMs seems to do really well is pull together pieces of information and make deductions about them. What they seem to do less well is reconciling an “outline” of a situation with the particular details involved (Something I’ve found ChatGPT reliably does badly is reconciling further detail you supply once it’s summarized a novel). A human or even an animal, is very good at interacting with complex, changing, multilayered situations that they only have a partial understanding of—especially staying within various safe zones that avoid different dangers. Driving a car is an example of this—you have a bunch of intersecting constraints that can come from a very wide range of things that can happen (but usually don’t). Slowing (or not) when you see a child’s ball go into the road is an archetypal example.
    I mean, most efforts to use deep learning in robotics have foundered on the problem that generating enough information to teach the thing to act in the world is extremely difficult. Which implies that the only way that these things can be taught to deal with a complex situation is by roughly complete modeling of it and in real world action situations, that simply may not be possible (contrast with video games or board games where summary of the rules is given and an uncertainty is “known unknowns”).
    ...having an external code loop that calls multiple networks to check markers of accuracy and effectiveness is scary and promising.
    Maybe but methods like this have been tried without neural nets for a while and haven’t by themselves demonstrated effectiveness. Of course, some code could produce AGI then nautral LLMs plus some code could produce AGI so the question is how much needs to be added.