Seth Herd comments on Agentized LLMs will change the alignment landscape

Seth Herd 9 Apr 2023 23:39 UTC
2 points
0
First, the most unique part of your comment:
However, it seems more plausible to me that there are still fundamental barriers to producing a computer/software system able to act effectively in the world. I’d see the distinction between being/seeming generically intelligent (apparent smartness), as LLMs certainly seem and acting effectively in the world as difference between drawing correct answer to complex questions 80% of the time and making seemingly simpler judgements that relate to all aspects of the world but with a 99.9..% accuracy (plus having a complex systems of effective fall-backs). Essentially the difference between ChatGPT and a self-driving car. It seems plausible to me that such goal seeking can’t easily be instilled in an LLM or similar neural net by the standard training loop even though that loop tends to produce apparent smarts and produce it better over time.
I agree that it’s hard to put the type of error-checking that humans use into a standard NN loop. That’s why having an external code loop that calls multiple networks to check markers of accuracy and effectiveness is scary and promising.
I don’t think there are fundamental barriers. Sensory and motor networks, and types of senses and actions that people don’t have, are well along. And the HuggingGPT work shows that they’re surprisingly easy to integrate with LLMs. That plus error-checking are how humans successfully act in the real world.
Again, or perhaps I wasn’t clear: I’m still really hopeful that this is a bigger win than a danger. The upsides for alignment are huge relative to other approaches.
By reading an agents thoughts (with assistance from simple monitoring networks), we will get warning shots as it starts to think of plans that deviate from its goals. Even if the work is proprietary, people will be able and likely eager to publish the general ways it goes off track, so that improvements can stop those gaps.
- JoeTheUser 10 Apr 2023 19:46 UTC
  1 point
  0
  Parent
  I don’t think there are fundamental barriers. Sensory and motor networks, and types of senses and actions that people don’t have, are well along. And the HuggingGPT work shows that they’re surprisingly easy to integrate with LLMs. That plus error-checking are how humans successfully act in the real world.
  I don’t think the existence of sensors is the problem. I believe that self-driving cars, a key example, have problems regardless of their sensor level. I see the key hurdle as ad-hoc action in the world. Overall, all of our knowledge about neural networks, including LLMs, is a combination of heuristic observations and mathematical and other intuitions. So I’m not certain that this hurdle won’t be overcome but I’d still like put the reasons that it could be fundamental.
  What LLMs seems to do really well is pull together pieces of information and make deductions about them. What they seem to do less well is reconciling an “outline” of a situation with the particular details involved (Something I’ve found ChatGPT reliably does badly is reconciling further detail you supply once it’s summarized a novel). A human or even an animal, is very good at interacting with complex, changing, multilayered situations that they only have a partial understanding of—especially staying within various safe zones that avoid different dangers. Driving a car is an example of this—you have a bunch of intersecting constraints that can come from a very wide range of things that can happen (but usually don’t). Slowing (or not) when you see a child’s ball go into the road is an archetypal example.
  I mean, most efforts to use deep learning in robotics have foundered on the problem that generating enough information to teach the thing to act in the world is extremely difficult. Which implies that the only way that these things can be taught to deal with a complex situation is by roughly complete modeling of it and in real world action situations, that simply may not be possible (contrast with video games or board games where summary of the rules is given and an uncertainty is “known unknowns”).
  ...having an external code loop that calls multiple networks to check markers of accuracy and effectiveness is scary and promising.
  Maybe but methods like this have been tried without neural nets for a while and haven’t by themselves demonstrated effectiveness. Of course, some code could produce AGI then nautral LLMs plus some code could produce AGI so the question is how much needs to be added.