Seth Herd comments on LLM+Planners hybridisation for friendly AGI

Seth Herd 3 May 2024 18:00 UTC
4 points
2
I think this is quite similar to my proposal in Capabilities and alignment of LLM cognitive architectures.
I think people will add cognitive capabilities to LLMs to create fully capable AGIs. One such important capability is executive function. That function is loosely defined in cognitive psychology, but it is crucial for planning among other things.
I do envision such planning looking loosely like a search algorithm, as it does for humans. But it’s a loose search algorithm, working in the space of statements made by the LLM about possible future states and action outcomes. So it’s more like a tree of thought or graph of thought than any existing search algorithm, because the state space isn’t well defined independently of the algorithm.
That all keeps things more dependent on the LLM black box, as in your final possibility.
At least I think that’s the analogy between the proposals? I’m not sure.
I think the pushback to both of these is roughly: this is safer how?
I don’t think there’s any way to strictly formalize not harming humans. My answer is halfway between that and your “sentiment analysis in each step of planning”. I think we’ll define rules of behavior in natural language, including not harming humans but probably much more elaborate, and implement both internal review, like your sentiment analysis but more elaborate, and external review by humans aided by tool AI (doing something like sentiment analysis), in a form of scalable oversight.
I’m curious if I’m interpreting your proposal correctly. It’s stated very succinctly, so I’m not sure.
- installgentoo 3 May 2024 22:42 UTC
  1 point
  0
  Parent
  Do you want to make a demo with dspy+gpt4 api+fast downward?