Great and informative post! It seems to me that this architecture could enhance safety to some extent in the short term. Let’s imagine an AI system similar to Auto-GPT, consisting of three parts: a large language model agent focused on creating stamps, a smaller language model dedicated to producing paperclips, and an even smaller scaffolding agent that leverages the language models to devise plans for world domination. Individually, none of these systems possess the intelligence to trigger an intelligence explosion or take over the world. If such a system reaches a point where it is capable of planning world domination, it is likely less dangerous than a simple language model with that goal would be, since the agent providing the goal is too simple to comprehend the importance of self-preservation and is further from superintelligence than the other parts. If so, scaffolding-like structures could be employed as a safety measure, and stop buttons might actually prove effective. Am I mistaken in my intuition? What would likely be the result of an intelligence explosion in the above example? Paperclip maximizers?
Great and informative post! It seems to me that this architecture could enhance safety to some extent in the short term. Let’s imagine an AI system similar to Auto-GPT, consisting of three parts: a large language model agent focused on creating stamps, a smaller language model dedicated to producing paperclips, and an even smaller scaffolding agent that leverages the language models to devise plans for world domination. Individually, none of these systems possess the intelligence to trigger an intelligence explosion or take over the world. If such a system reaches a point where it is capable of planning world domination, it is likely less dangerous than a simple language model with that goal would be, since the agent providing the goal is too simple to comprehend the importance of self-preservation and is further from superintelligence than the other parts. If so, scaffolding-like structures could be employed as a safety measure, and stop buttons might actually prove effective. Am I mistaken in my intuition? What would likely be the result of an intelligence explosion in the above example? Paperclip maximizers?