LLM+Planners hybridisation for friendly AGI

Every LLM in existence is a blackbox, and alignment relying on tuning the blackbox never succeeds—that is evident by that fact that even models like ChatGPT get jailbroken constantly. Moreover, blackbox tuning has no reason to transfer to bigger models.

A new architecture is required. I propose using an LLM to parse environment into planner format such as STRIPS, and then using an algorithmic planner such as fast downward in order to implement agentic behaviour. The produced plan is then parsed back into natural language or into commands to execute automatically. Such architecture would also be commercially desirable and would deincentivise investments into bigger monolithic models.

Draft of the architecture: https://​​gitlab.com/​​anomalocaribd/​​prometheus-planner/​​-/​​blob/​​main/​​architecture.md

q: What part of the alignment problem does this plan aim to solve?

a: Defining hard constraints on AI behaviour.

q: How does this plan aim to solve the problem?

a:A planner would be either hard-constrained with formal language definition of not harming humans, or would incorporate sentiment analysis in each step of planning.

q: What evidence is there that the methods will work?

a: https://​​arxiv.org/​​abs/​​2304.11477

q: What are the most likely causes of this not working?

a: STRIPS is a very primitive planning language, and for the more complex ones planners need to be developed almost from scratch. This approach might once again teach us the bitter lesson, as completely algorithmic planner becomes unfeasible to implement. Naive implementation will also suffer from combinatorial explosion almost immediately. However, this may all be mitigated with further integration of LLM functionality into planning process. And, even though this brings back the problem of blackboxes, the architecture will still enable some degree of compartmentalisation, which will mean that a smaller, more easily interpretable models shall have to be contended with.